The research on text clustering based on LDA joint model.

JOURNAL OF INTELLIGENT & FUZZY SYSTEMS(2017)

引用 5|浏览29
暂无评分
摘要
This paper proposed a cluster algorithm based on the combination of LDA(Latent Dirichlet allocation) probabilistic topic model and VSM (Vector Space Model), with the three-tier framework adopted containing text, topic and feature word. Although LDA alone has the ability to seek out the hidden topic knowledge, it is hard for the low-dimensional model to remain the integrity of the text information, leading to insufficient capacity for distinguishing texts. The paper is set to launch the cluster analysis in turns of feature words and topic through integrating two model above. With a better mix of LDA and VSM, the clustering effect will be improved, paralleling determining the optimal clustering number K of the K-means algorithms and optimum topic number T of LDA model. In order to design the algorithms more scientifically and effectively, silhouette coefficient and Dunn coefficient have been brought in to make assessments.
更多
查看译文
关键词
Text cluster,LDA model,K-means algorithms,VSM model,silhouette coefficient,Dunn coefficient
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要