Key Word Extraction For Short Text Via Word2vec, Doc2vec, And Textrank

Jun Li, Guimin Huang, Chunli Fan, Zhenglin Sun,Hongtao Zhu

TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES(2019)

引用 29|浏览8
暂无评分
摘要
The rapid development of social media encourages people to share their opinions and feelings on the Internet. Every day, a large number of short text comments are generated through Twitter, microblogging, WeChat, etc., and there is high commercial and social value in extracting useful information from these short texts. At present, most studies have focused on extracting text key words. For example, the LDA topic model has good performance with long texts, but it loses effectiveness with short texts because of the noise and sparsity problems. In this paper, we attempt to use Word2Vec and Doc2Vec to improve short-text key word extraction. We first added the method of the collaborative training of word vectors and paragraph vectors and then used the TextRank model's clustering nodes. We adjusted the weights of the key words that were generated by computing the jump probability between nodes and then obtained the node-weighted score, and eventually sorted the generated key words. The experimental results show that the improved method has good performance on the datasets.
更多
查看译文
关键词
Key word extraction,short text,word2vec,doc2vec,textrank
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要