Building an Optimal Dataset for Arabic Fake News Detection

Mohammad A. Bsoul,Abdallah Qusef,Saleh Abu-Soud

Procedia Computer Science(2022)

引用 2|浏览0
暂无评分
摘要
Fake news detection for Arabic news has drawn some attention recently. However, the number of such studies are limited due to the lack of datasets that can be used to perform them. Clickbait detection is typically linked to fake news detection as clickbaits are effective in spreading fake news. The lack of dataset in the Arabic language to study clickbait detection models is also evident. This paper presents a dataset of Arabic clickbait news for the first time. The purpose of this dataset is to enable the automatic classification of news headlines as “Clickbait” or “Not Clickbait” using a machine learning model. More than 3000 news records are sampled from five months of tweets for 24 Jordanian news publishers. All sampled news records are labeled by three annotators and that resulted in 18% clickbait news records. The annotator unanimously agreed on the class of about 81% of the labeled news records. To showcase the usability of the resulting dataset in machine learning, Logistic Regression, Support Vector Machine, Random Forrest, Naïve Bayes, Stochastic Gradient Descent, Nearest Neighbor, and Decision Tree are applied to this dataset. These models produced Macro F1-Score value up to 0.81 indicating that the automatic detection of clickbait news headlines using machine learning is feasible.
更多
查看译文
关键词
Clickbait,Arabic,Fake News,Dataset,Machine Learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要