Learning Latent Topics from the Word Co-occurrence Network.

Communications in Computer and Information Science(2017)

引用 12|浏览16
暂无评分
摘要
Topic modeling is widely used to uncover the latent thematic structure in corpora. Based on the separability assumption, the spectral method focuses on the word co-occurrence patterns at the document-level and it includes two steps: anchor selection and topic recovery. Biterm Topic Model (BTM) utilizes the word co-occurrence patterns in the whole corpus. Inspired by the word-pair pattern in BTM, we build a Word Co-occurrence Network (WCN) where nodes correspond to words and weights of edges stand for the empirical co-occurrence probability of word pairs. We exploit existing methods to deal with the word co-occurrence network for anchor selection. We find a K-clique in the unweighted complementary graph, or the maximum edge-weight clique in the weighted complementary graph for the anchor word selection. Experiments on real-world corpora evaluated on topic quality and interpretability demonstrate the effectiveness of the proposed approach.
更多
查看译文
关键词
Topic model,Word co-occurrence network,Maximum edge-weight clique,K-clique
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要