An effective approach for Arabic document classification using machine learning

Global Transitions Proceedings(2022)

引用 8|浏览1
暂无评分
摘要
Arabic text classification is one application of Natural Language Processing (NLP). It has been used to analyze and categorize Arabic text. Analyzing text has become an essential part of our lives because of the increasing number of text data which makes text classification a big data problem. Arabic text classification systems become significant to maintain vital information in many domains such as education, and health sector, and public services. In the presented research work, the Arabic text classification model is developed using various algorithms namely Multinomial Naïve Bayesian (MNB), Bernoulli Naïve Bayesian (BNB), Stochastic Gradient Descent (SGD), Logistic Regression (LR), Support vector classifier (SVC), Linear SVC, and convolutional neural networks (CNN). These algorithms have been implemented utilizing the Al-Khaleej dataset. The experiments are carried out with various representation models and it is observed that CNN with character level model outperforms others. The result of CNN exceeds the state-of-the-art machine learning method with an accuracy equal to 98. The presented methods will be useful in different domains, particularly on social media.
更多
查看译文
关键词
Text Mining,Arabic language,Text Pre-processing,Representation,Document classification
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要