ClusterVote: Automatic Summarization Dataset Construction with Document Clusters.

SPECOM(2022)

引用 0|浏览1
暂无评分
摘要
Creating a summarization dataset is a costly task due to the amount of expertise and human work required to compose quality summaries. To alleviate the issue, several pseudo-summary approaches were developed, but due to a lack of domain adaptation mechanism, they were not applied beyond language model pretraining We find that this shortcoming can be overcome by leveraging document clusters. We propose ClusterVote, a pseudo-summarization approach that accounts for domain summarization patterns by studying links between related documents. The method can be configured for different levels of granularity and produce both extractive and abstractive summaries. We evaluate the approach by collecting Telegram news summarization dataset and testing state-of-the-art models. The experimental results show that the most refined variant of ClusterVote has similar extractive properties to CNN/Daily Mail dataset and proves to be challenging for summarization systems.
更多
查看译文
关键词
Abstractive summarization,Dataset for summarization,Clustering
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要