Improving extractive summarization with semantic enhancement through topic-injection based BERT model

INFORMATION PROCESSING & MANAGEMENT(2024)

引用 0|浏览2
暂无评分
摘要
In the field of text summarization, extractive techniques aim to extract key sentences from a document to form a summary. However, traditional methods are not sensitive enough to obtain the core semantics of the text, resulting in summaries that contain complicate comprehension. Recently, topic extraction technology extracts core semantics from text, enabling accurate summaries of the main points of a document. In this paper, we introduce the Topic -Injected Bidirectional Encoder Representations from Transformers (TP-BERT), a novel neural auto -encoder model designed explicitly for extractive summarization. TP-BERT integrates document -related topic words into sentences, improving contextual understanding and more accurately aligning summaries with a document's main theme, addressing a key shortfall in traditional extractive methods. Another major innovation of TP-BERT is the use of contrastive learning during training. This method enhances summarization efficiency by giving prominence to key sentences and minimizing peripheral information. Additionally, we conducted ablation studies and parameter studies of TP-BERT conducted on the CNN/DailyMail, WikiHow, and XSum datasets. In our two main experiments, the average ROUGE -F1 score improved by 2.69 and 0.45 across the three datasets. In comparison to baseline methods, TP-BERT has demonstrated better performance based on the increase in ROUGE -F1 scores on three datasets. Moreover, the semantic differentiation between sentence representations has also contributed positively to the performance enhancements.
更多
查看译文
关键词
Extractive summarization,Topic model,Transformer,Information fusion
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要