Determining the prevalence of cannabis, tobacco, and vaping device mentions in online communities using natural language processing

DRUG AND ALCOHOL DEPENDENCE(2021)

引用 3|浏览10
暂无评分
摘要
Introduction: The relationship between cannabis, tobacco, and vaping devices is both rapidly changing and poorly understood, with consumers rapidly shifting between use of all three product types. Given this dynamic and evolving landscape, there is an urgent need to monitor and better understand co-use, dual-use, and transition patterns between these products. This study describes work that utilizes social media - in this case, Reddit - in conjunction with automated Natural Language Processing (NLP) methods to better understand cannabis, tobacco, and vaping device product usage patterns. saZMethods: We collected Reddit data from the period 2013-2018, sourced from eight popular, high-volume Reddit communities (subreddits) related to the three product categories. We then manually annotated (coded) a set of 2640 Reddit posts and trained a machine learning-based NLP algorithm to automatically identify and disambiguate between cannabis or tobacco mentions (both smoking and vaping) in Reddit posts. This classifier was then applied to all data derived from the eight subreddits, 767,788 posts in total. Results: The NLP algorithm achieved an overall moderate performance (overall F-score of 0.77). When applied to our large corpus of Reddit posts, we discovered that over 10% of posts in the smoking cessation subreddit r/ stopsmoking were classified as referring to vaping nicotine, and that only 2% of posts from the subreddits r/ electronic_cigarette and r/vaping were classified as referring to smoking (tobacco) cessation. Conclusions: This study presents the results of applying an NLP algorithm designed to identify and distinguish between cannabis and tobacco mentions (both smoking and vaping) in Reddit posts, hence contributing to our currently limited understanding of co-use, dual-use, and transition patterns between these products.
更多
查看译文
关键词
Cannabis, Tobacco, Social media, Natural language processing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要