On-line relevant anomaly detection in the Twitter stream: an efficient bursty keyword detection model

KDD' 13: The 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Chicago Illinois August, 2013(2013)

引用 20|浏览1
暂无评分
摘要
On-line social networks have become a massive communication and information channel for users world-wide. In particular, the microblogging platform Twitter, is characterized by short-text message exchanges at extremely high rates. In this type of scenario, the detection of emerging topics in text streams becomes an important research area, essential for identifying relevant new conversation topics, such as breaking news and trends. Although emerging topic detection in text is a well established research area, its application to large volumes of streaming text data is quite novel. Making scalability, efficiency and rapidness, the key aspects for any emerging topic detection algorithm in this type of environment. Our research addresses the aforementioned problem by focusing on detecting significant and unusual bursts in keyword arrival rates or bursty keywords. We propose a scalable and fast on-line method that uses normalized individual frequency signals per term and a windowing variation technique. This method reports keyword bursts which can be composed of single or multiple terms, ranked according to their importance. The average complexity of our method is O(n log n), where n is the number of messages in the time window. This complexity allows our approach to be scalable for large streaming datasets. If bursts are only detected and not ranked, the algorithm remains with lineal complexity O(n), making it the fastest in comparison to the current state-of-the-art. We validate our approach by comparing our performance to similar systems using the TREC Tweet 2011 Challenge tweets, obtaining 91% of matches with LDA, an off-line gold standard used in similar evaluations. In addition, we study Twitter messages related to the SuperBowl football events in 2011 and 2013.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要