A Semantic Text Expansion for Paraphrasing Identification in Arabic Microblog Posts.

MEDES(2022)

引用 0|浏览4
暂无评分
摘要
An enormous number of microblogs are being created and posted on the web each day. Many of these microblogs are repetitive in terms of content and similar in terms of topic. Being able to detect repetitive content can support various applications such as question answering and trendy topic detection. In this research, we aim to propose a model to detect paraphrasing among Arabic tweets, in addition to identifying tweets belonging to the same topic. The proposed model is based on Latent Dirichlet Allocation (LDA) topic modeling, as well as, semantic text expansion utilizing external resources i.e. BabelNet and Wikipedia. Tweets from multiple Arabic news agencies were collected, preprocessed, and divided into two groups. The first group was used to build the topic modeling and the other group of tweets was paired and classified based on the topic distributions. The results are promising in terms of precision on tweet pairs with a certain time overlap. The best-reported precision is 80.1% achieved using Wikipedia embedded content on the stemmed text mode with a large number of LDA topics.
更多
查看译文
关键词
Arabic Paraphrase Identification, Arabic Semantic Text Expansion, Arabic Topic Modelling
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要