Experimenting Machine-Learning Algorithms for Morphological Disambiguation of Arabic Texts

Bilel Elayeb, Mohamed Firas Ettih,Raja Ayed

ICAART: PROCEEDINGS OF THE 14TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 3(2022)

引用 1|浏览2
暂无评分
摘要
Arabic language is characterized by its complexity and its morphological and orthographic variations including syntactic and semantic diversity of a word. This specificity may cause Arabic morphological ambiguity. We present in this paper a new architecture for morphological disambiguation of Arabic texts. The latter can be treated as a classification problem where the set of morphological features' values represent classes, and a classification algorithm is used to assign a class to each word's occurrence based on the context. The first step consists of identifying the correct morphological analysis of a non-vocalized Arabic word using the morphological dependencies extracted from the corpus of vocalized texts. Then, we propose a method of transforming imperfect training datasets into perfect data having precise attributes and certain classes. We experiment this architecture on a set of machine-learning classifiers using a corpus of classic Arabic texts. Results highlight some statistically significant improvement of SVM and Naive Bayes classifiers in terms of disambiguation rate.
更多
查看译文
关键词
Morphological Disambiguation, Arabic Text, Machine-Learning Algorithms, Data Transformation, Morphological Feature, Classification
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要