Using Statistical and Semantic Analysis for Arabic Text Summarization

international conference on information technology(2017)

引用 4|浏览14
暂无评分
摘要
Automatic text summarization is an essential tool to overcome the problem of information overload. So far this field has not been studied enough for Arabic language and currently only few related works are available. Arabic text summarization is faced with two main issues: how to extract semantic relationships between textual units and deal with redundancy. To overcome these problems, we propose in this paper a hybrid method to generate an extractive summary of Arabic documents. Our approach is based on a two-dimensional undirected and weighted graph with sentences as nodes and each pair of sentences are connected by two edges representing the statistical and semantic similarity measure. The statistical similarity measure builds on the content overlap between two sentences, while the semantic one is based upon semantic information extracted from Arabic WordNet (AWN) ontology. Then, the score of each sentence is computed by performing the ranking algorithm PageRank on the generated graph. Thereafter, the score of each sentence is performed by adding other statistical features of the text such as TF.ISF and sentence position. The final summary is built by selecting the top-ranking sentences. Finally, we deal with redundancy and information diversity issues by using an adapted maximal marginal relevance (MMR) method. Experimental results on EASC dataset show that our proposed approach outperforms some of existing Arabic summarization systems.
更多
查看译文
关键词
Arabic text summarization, Arabic NLP, Statistic approach, Sematic approach, AWN, Graph model
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要