TweetDIS: A Large Twitter Dataset for Natural Disasters Built using Weak Supervision

arxiv(2022)

引用 1|浏览14
暂无评分
摘要
Social media is often utilized as a lifeline for communication during natural disasters. Traditionally, natural disaster tweets are filtered f rom t he T witter s tream u sing t he n ame of the natural disaster and the filtered t weets a re s ent f or human annotation. The process of human annotation to create labeled sets for machine learning models is laborious, time consuming, at times inaccurate, and more importantly not scalable in terms of size and real-time use. In this work, we curated a silver standard dataset using weak supervision. In order to validate its utility, we train machine learning models on the weakly supervised data to identify three different types of natural disasters i.e earthquakes, hurricanes and floods. O ur r esults d emonstrate t hat models trained on the silver standard dataset achieved performance greater than 90% when classifying a manually curated, gold-standard dataset. To enable reproducible research and additional downstream utility, we release the silver standard dataset for the scientific community.
更多
查看译文
关键词
large twitter dataset,natural disasters,tweetdis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要