Investigate the Impact of Stemming on Mauritanian Dialect Classification using Machine Learning Techniques

Mohamed El Moustapha El Arby Chrif, Cheikhane Seyed, Cheikhne Mohamed Mahmoud, E. L. B. E. N. A. N. Y. Mohamed Mahmoud, Fatimetou Mint Mohamed-Saleck,Moustapha Mohamed Saleck, Omar EL Beqqali,Mohamedade Farouk Nanne

INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS(2023)

引用 0|浏览0
暂无评分
摘要
the plethora and diversity of research on Natural Language Processing (NLP). As a technique allowing computers to understand, generate, and manipulate human language; It still remains insufficient, especially with regard to the processing of Arabic texts and their dialects which are widely used. The proposed approach focuses on the application of machine learning techniques taking into account evaluation criteria such as training to comments expressed in Mauritanian dialect, published on social media notably Facebook, and com-pares results generated by three algorithms which we applied such as the Random Forest (RF), Nai center dot ve Bayes Multinominal (NBM), and Logistic Regression (LR) algorithm. Additionally, We then study the effect of machine learning techniques when different stemmers are combined with other features such as the tokenizers used to process the dataset. Although major challenges exist such as the morphology of Arabic is completely different from Latin letter languages, and there is no pre-existing dataset or dictionary to train the algorithms, the result we obtained after the experiments carried out on Weka shows that the RF and NBM algorithms are more efficient when applied with ArbicStemmerKhoja giving results respectively 96.37% and 71.40%; However, Logistic gets better performance results with Null Stemme is 81.65%. Results obtained by the three techniques applied with a light Arabic stemmer were more than 70%. This article presents a contribution to NLP based on Machine learning, descript also an important study that can determine the best Arabic classifier.
更多
查看译文
关键词
Machine learning,Natural Language Processing,Arabic text classification,HASSANIYA dialect,Weka,stemming
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要