Integrating Social and Auxiliary Semantics for Multifaceted Topic Modeling in Twitter.

ACM Trans. Internet Techn.(2014)

引用 47|浏览67
暂无评分
摘要
Microblogging platforms, such as Twitter, have already played an important role in recent cultural, social and political events. Discovering latent topics from social streams is therefore important for many downstream applications, such as clustering, classification or recommendation. However, traditional topic models that rely on the bag-of-words assumption are insufficient to uncover the rich semantics and temporal aspects of topics in Twitter. In particular, microblog content is often influenced by external information sources, such as Web documents linked from Twitter posts, and often focuses on specific entities, such as people or organizations. These external sources provide useful semantics to understand microblogs and we generally refer to these semantics as auxiliary semantics. In this article, we address the mentioned issues and propose a unified framework for Multifaceted Topic Modeling from Twitter streams. We first extract social semantics from Twitter by modeling the social chatter associated with hashtags. We further extract terms and named entities from linked Web documents to serve as auxiliary semantics during topic modeling. The Multifaceted Topic Model (MfTM) is then proposed to jointly model latent semantics among the social terms from Twitter, auxiliary terms from the linked Web documents and named entities. Moreover, we capture the temporal characteristics of each topic. An efficient online inference method for MfTM is developed, which enables our model to be applied to large-scale and streaming data. Our experimental evaluation shows the effectiveness and efficiency of our model compared with state-of-the-art baselines. We evaluate each aspect of our framework and show its utility in the context of tweet clustering.
更多
查看译文
关键词
Algorithms,Experimentation,Social media,topic model,unsupervised learning,semantic enrichment
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要