Does Twitter language reliably predict heart disease? A commentary on Eichstaedt et al. (2015a).

PEERJ(2018)

引用 4|浏览4
暂无评分
摘要
We comment on Eichstaedt et al.'s (2015a) claim to have shown that language patterns among Twitter users, aggregated at the level of US counties, predicted county-level mortality rates from atherosclerotic heart disease (AHD), with "negative" language being associated with higher rates of death from AHD and "positive" language associated with lower rates. First, we examine some of Eichstaedt et al.'s apparent assumptions about the nature of AHD, as well as some issues related to the secondary analysis of online data and to considering counties as communities. Next, using the data files supplied by Eichstaedt et al., we reproduce their regression- and correlation-based models, substituting mortality from an alternative cause of death-namely, suicide-as the outcome variable, and observe that the purported associations between "negative" and "positive" language and mortality are reversed when suicide is used as the outcome variable. We identify numerous other conceptual and methodological limitations that call into question the robustness and generalizability of Eichstaedt et al.'s claims, even when these are based on the results of their ridge regression/machine learning model. We conclude that there is no good evidence that analyzing Twitter data in bulk in this way can add anything useful to our ability to understand geographical variation in AHD mortality rates.
更多
查看译文
关键词
Heart disease,Risk factors,Well-being,Big data,Artifacts,Emotions,Social media,False positives,Language
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要