Dealing with complete separation and quasi-complete separation in logistic regression for linguistic data

Research Methods in Applied Linguistics(2023)

引用 2|浏览9
暂无评分
摘要
Logistic regression is a powerful and widely used analytical tool in linguistics for modelling a binary outcome variable against a set of explanatory variables. One challenge that can arise when applying logistic regression to linguistics data is complete or quasi-complete separation, phenomena that occur when (paradoxically) the model has too much explanatory power, resulting in effectively infinite coefficient estimates and standard errors. Instead of seeing this as a drawback of the method, or naïvely removing covariates that cause separation, we demonstrate a straightforward and user-friendly modification of logistic regression, based on penalising the coefficient estimates, that is capable of systematically handling separation. We illustrate the use of penalised, multi-level logistic regression on two clustered datasets relating to second language acquisition and corpus data, showing in both cases how penalisation remedies the problem of separation and thus facilitates sensible and valid statistical conclusions to be drawn. We also show via simulation that results are not overly sensitive to the amount of penalisation employed for handling separation.
更多
查看译文
关键词
Categorical data,Complete separation,Corpus data,Logistic regression,Multi-level models,Multinomial regression,Random effects,Second language acquisition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要