Clustering Microtext Streams for Event Identification.

IJCNLP(2013)

引用 29|浏览27
暂无评分
摘要
The popularity of microblogging systems has resulted in a new form of Web data – microtext – which is very different from conventional well-written text. Microtext often has the characteristics of informality, brevity, and varied grammar, which poses new challenges in applying traditional clustering algorithms to analyze microtext. In this paper, we propose a novel two-phase approach for clustering streaming microtext, in particular Twitter messages, into event-based clusters. In the online phase, an incremental process is applied to discover base clusters and maintain detailed summary statistics. Upon demand for any user-specified time horizons, an offline phase is triggered to merge related clusters together. We demonstrate that our proposed approach can achieve better clustering accuracy than state-ofthe-art methods.
更多
查看译文
关键词
microtext streams,clustering,event
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要