Clustering memes in social media streams

ASONAM '13: Advances in Social Networks Analysis and Mining 2013 Niagara Ontario Canada August, 2013(2014)

引用 57|浏览61
暂无评分
摘要
The problem of clustering content in social media has pervasive applications, including the identification of discussion topics, event detection, and content recommendation. Here, we describe a streaming framework for online detection and clustering of memes in social media, specifically Twitter. A pre-clustering procedure, namely protomeme detection, first isolates atomic tokens of information carried by the tweets. Protomemes are thereafter aggregated, based on multiple similarity measures, to obtain memes as cohesive groups of tweets reflecting actual concepts or topics of discussion. The clustering algorithm takes into account various dimensions of the data and metadata, including natural language, the social network, and the patterns of information diffusion. As a result, our system can build clusters of semantically, structurally, and topically related tweets. The clustering process is based on a variant of Online K -means that incorporates a memory mechanism, used to “forget” old memes and replace them over time with the new ones. The evaluation of our framework is carried out using a dataset of Twitter trending topics. Over a 1-week period, we systematically determined whether our algorithm was able to recover the trending hashtags. We show that the proposed method outperforms baseline algorithms that only use content features, as well as a state-of-the-art event detection method that assumes full knowledge of the underlying follower network. We finally show that our online learning framework is flexible, due to its independence of the adopted clustering algorithm, and best suited to work in a streaming scenario.
更多
查看译文
关键词
Cluster Algorithm,Ground Truth,Cosine Similarity,Twitter User,Stream Cluster
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要