Statistical word sense aware topic models

Soft Computing(2014)

引用 5|浏览44
暂无评分
摘要
LDA has been proved effective in modeling the semantic relation between surface words. This semantic information in the document collection is useful to measure the topic distribution for a document. In general, a surface word may significantly contribute to several topics in a document collection. LDA measures the contribution of a surface word to each topic and considers a surface word to be identical across all documents. However, a surface word may present different signatures in different contexts, i.e., polysemous words can be used with different senses in different contexts. Intuitively, disambiguating word senses for topic models can enhance their discriminative capabilities. In this work, we propose a joint model to automatically induce document topics and word senses simultaneously. Instead of using some pre-defined word sense resources, we capture the word sense information via a latent variable and directly induce them in a fully unsupervised manner from the corpora. Experimental results show that the proposed joint model outperforms the baselines significantly in document clustering and improves the word sense induction as well against a standalone non-parametric model.
更多
查看译文
关键词
Topic modeling,Word sense induction,Document representation,Document clustering
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要