External Evaluation of Topic Models: A Graph Mining Approach
ICDM(2013)
摘要
Given a topic and its top-k most relevant words generated by a topic model, how can we tell whether it is a low-quality or a high-quality topic? Topic models provide a low-dimensional representation of large document corpora, and drive many important applications such as summarization, document segmentation, word-sense disambiguation, etc. Evaluation of topic models is an important issue, since low-quality topics potentially degrade the performance of these applications. In this paper, we develop a graph mining and machine learning approach for the external evaluation of topic models. Based on the graph-centric features we extract from the projection of topic words on the Wikipedia page-links graph, we learn models that can predict the human-perceived quality of topics (based on human judgments), and classify them as high or low quality. Experiments on four real-world corpora show that our approach boosts the prediction performance up to 30% over three baselines of various complexities, and demonstrate the generality of our method to diverse domains. In addition, we provide an interpretation of our models and outline the discriminating characteristics of topic quality.
更多查看译文
关键词
human evaluation,human-perceived quality,low-quality topic,summarization,top-k most relevant words,learning (artificial intelligence),pattern classification,topic quality,low-dimensional representation,large document corpora,topic models,feature extraction,web sites,high-quality topic,machine learning approach,graph mining,data mining,graph theory,topic word projection,natural language processing,topic model,document segmentation,text analysis,word sense disambiguation,graph-centric feature extraction,wikipedia page-links graph,learning artificial intelligence
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络