A Graph and PhoBERT based Vietnamese Extractive and Abstractive Multi-Document Summarization Frame

2022 RIVF International Conference on Computing and Communication Technologies (RIVF)(2022)

引用 0|浏览17
暂无评分
摘要
Although many methods of solving the Multi-Document Summarization (MDS) problem have been proposed, which belong to both extractive and abstractive summarization, models using only one of the two types of summarization still bring their own disadvantages. One of the good and potential approaches to the MDS problem is the combined approach of extractive and abstractive summarization. Currently, with many languages and especially Vietnamese, the studies that propose a combination of extractive and abstractive summarization are still very limited and have not been deeply exploited. In this paper, we propose a new MDS frame which contains two components in a pipeline architecture combining extractive and abstractive approaches for Vietnamese MDS. We use extractive approach in the first component to select the most important sentences in each document by constructing graphs with the edges representing sentences' relationship, nodes illustrating sentences of input documents. The selected sentences will be clustered to groups of sentences with similar meaning, then combined into documents corresponding to each group. The abstractive approach is used in second component, which uses the PhoBERT2PhoBERT model to generate final summary document. The results of the frame achieved a positive evaluation with the ROUGE-2 measure on two datasets ViMs and VN-MDS are 36.42 and 34.89 percent respectively.
更多
查看译文
关键词
vietnamese extractive,multi-document
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要