Real-Time Top-R Topic Detection On Twitter With Topic Hijack Filtering

KDD(2015)

引用 33|浏览17
暂无评分
摘要
Twitter is a "what's-happening-right-now" tool that enables interested parties to follow thoughts and commentary of individual users in nearly real-time. While it is a valuable source of information for real-time topic detection and tracking, Twitter data are not clean because of noisy messages and users, which significantly diminish the reliability of obtained results.In this paper, we integrate both the extraction of meaningful topics and the filtering of messages over the Twitter stream. We develop a streaming algorithm for a sequence of document-frequency tables; our algorithm enables real-time monitoring of the top-10 topics from approximately 25% of all Twitter messages, while automatically filtering noisy and meaningless topics. We apply our proposed streaming algorithm to the Japanese Twitter stream and successfully demonstrate that, compared with other online nonnegative matrix factorization methods, our framework both tracks real world events with high accuracy in terms of the perplexity and simultaneously eliminates irrelevant topics.
更多
查看译文
关键词
Twitter,topic detection,streaming algorithm,nonnegative matrix factorization,noise filtering
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要