Using Topic in Summarization for Vietnamese Paragraph

Dat Tien Dieu,Dien Dinh

INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS(2023)

引用 0|浏览0
暂无评分
摘要
article delves into the realm of refining the precision of automated text summarization tasks by harnessing the underlying themes within the documents. Our training data draws upon the VNDS dataset (A_Vietnamese_Dataset_ for_Summarization), encompassing a total of 150,704 samples aggregated from diverse online news sources like vnexpress.net, tuoitre.vn, and more. These articles have been meticulously processed to ensure they align with our training objectives and criteria. This paper presents an approach to text summarization that is theme-oriented, utilizing Latent Dirichlet Allocation to delineate the document's subject matter. The data subsequently have been fed into the BERT model, which constitutes one of the subtasks within the broader domain of abstractive summarization-summarizing content based on pivotal concepts. The results attained, although modest, underscore the challenges we've confronted. Consequently, our model necessitates further development and refinement to unlock its full potential.
更多
查看译文
关键词
Automatic text summarization,a theme-based approach,BERTmodel,latent dirichlet allocation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要