An Improved LDA Multi-document Summarization Model Based on TensorFlow

2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI)(2017)

引用 12|浏览21
暂无评分
摘要
Latent Dirichlet Allocation (LDA), has been recently used to automatically generate text corpora topics, and applied to sentences extraction based multi-document summarization algorithms. In this paper, we propose a novel approach to automatic generation of aspect-oriented summaries from multiple documents. Our approach is to combine the traditional summary generation algorithm and the the abstract generation algorithm based on deep learning.We employ the improved traditional summary generation algorithm to convert multiple documents into a single document, and then using the resulting single document with the deep learning method to extract the final summary. At first, we apply improved LDA model to cluster sentences in all documents. Second, We employ the extended LexRank algorithm to sort the sentences in each cluster. Third, we use extended Hedge Trimmer algorithm for sentence compression. Fourth, We apply Integer Linear Programming for sentence selection, and in this step ,we get the single document. Finally, We employ the textum on TensorFlow to get the final abstract. The experiments showed that the proposed algorithm achieved better performance compared the other state-of-the-art algorithms on DUC2005 and TAC2010 corpus.
更多
查看译文
关键词
Natural Language Processing,multi document summarization,LDA,TensorFlow
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要