Optimizing word set coverage for multi-event summarization

Journal of Combinatorial Optimization(2015)

引用 6|浏览21
暂无评分
摘要
We have witnessed the proliferation of the Internet over the past few decades. A large amount of textual information is generated on the Web. It is impossible to locate and digest all the latest updates available on the Web for individuals. Text summarization would provide an efficient way to generate short, concise abstracts from the massive documents. These massive documents involve many events which are hard to be identified by the summarization procedure directly. We propose a novel methodology that identifies events from these text corpora and creates summarization for each event. We employ a probabilistic, topic model to learn the potential topics from the massive documents and further discover events in terms of the topic distributions of documents. To target the summarization, we define the word set coverage problem (WSCP) to capture the most representative sentences to summarize an event. For getting solution of the WSCP, we propose an approximate algorithm to solve the optimization problem. We conduct a set of experiments to evaluate our proposed approach on two real datasets: Sina news and Johnson & Johnson medical news. On both datasets, our proposed method outperforms competitive baselines by considering the harmonic mean of coverage and conciseness.
更多
查看译文
关键词
Event summarization,Word set,Set coverage,Optimization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要