Abstractive Text Summarization of Hindi Corpus Using Transformer Encoder-Decoder Model

Rashi Bhansali,Anushka Bhave, Gauri Bharat, Vedant Mahajan,Manikrao Laxmanrao Dhore

International Symposium on Intelligent Informatics（2023）

引用 0|浏览4

暂无评分

摘要

Text Summarization based on Abstraction is the task of generating a concise summary that captures the principal ideas of the source text. It potentially contains new phrases that do not appear in the original text. Although it is widely studied for languages like English and French, owing to the scarcity of data on regional vernacular languages like Hindi, the research in this area is still in the primitive stages. We propose a novel approach for building an Abstractive Text Summarizer for Hindi corpus using the Transformer encoder-decoder architecture. Firstly, efficient pre-trained word representations are generated using Facebook’s fastText model. Next, the Transformer model is employed to extract contextual dependencies and yield better semantic representations for a morphologically rich language like Hindi, engendering an abstractive summary. On performing an experimental evaluation on the Hindi news dataset to generate news article headlines, we achieve a ROUGE-1 precision and recall score of 0.682 and 0.598, respectively, which outperforms the state-of-the-art techniques.

查看译文

关键词

hindi corpus,summarization,text,encoder-decoder

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要