Words of Suicide - Identifying Suicidal Risk in Written Communications.

IEEE BigData(2021)

引用 2|浏览5
暂无评分
摘要
Suicide is a global health problem with more than 700,000 individuals dying by self-destruction each year, yet it is classified as a low base rate behavior that is difficult to prognosticate. Aiming to advance suicide prediction and prevention, we examined the potential use of machine learning and text analyses models to predict suicide risk based on written communications. Specifically, we used a dataset consisting of more than 27,000 general writings unrelated to suicide, 193 genuine suicide notes from individuals who committed suicide, and an additional 89 suicide posts shared on sub-Reddits for an in-the-wild test to examine the prediction accuracy of two machine learning models (SVM & RoBERTa) and a linguistic marker model. Our tests showed that the machine learning models performed better than the linguistic marker model when examined on the test data. However, the linguistic marker model achieved higher results in the wild, correctly classifying 88% of written communications as a "high risk of suicide" versus 56% and 70% of the machine learning models. The best in-the-wild performing model was adopted in an online suicide risk assessment tool called Edwin to honor Edwin Shneidman for his numerous contributions to the field of suicidology. Finally, discrepancies between training and real-world data, vocabulary variation across domains, and the limited number of benchmarks constitute limitations that need to be addressed in future research.
更多
查看译文
关键词
Suicide,machine learning,linguistic marker,RoBERTa,SVM
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要