Trend Detection in Text Collections using Latent Dirichlet Allocation Submitted for Blind Review

Levent Bolelli, Seyda Ertekin,Ding Zhou, C. Lee Giles

semanticscholar(2007)

引用 0|浏览0
暂无评分
摘要
Algorithms that enable the process of automatically mining distinct topics in document collections have become increasingly important due to their applications in many fields and the extensive growth of the number of documents in many domains. Traditionally, the task of topic discovery has been mainly addressed through algorithms that work on a snapshot view of the documents, which ignores the temporal characteristics of the collection. In a significan t number of collections, the documents are temporal in nature and this temporal dimension can influence the topic discovery process. In this paper, we propose a generative model based on latent Dirichlet allocation that integrates the temporal ordering of the documents into the generative process in an iterative fashion. The document collection is divided into time segments and the topics discovered in each segment is propagated to influence the topic discovery in the subsequent time segments. We conduct experiments on the collection of academic papers from CiteSeer repository . In addition to the textual content of the documents, we augment the text corpus with the addition of user queries and tags and integrate the citation graph to boost the topica l terms. The experiment results show that we can effectively detect distinct topics and their evolution over time.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要