Towards Improving Rule-Based Arabic Root Extraction Algorithm for Non-Vocalized Text

Nisrean Thalji,Zyad Thalji, Walid Bani Hani

semanticscholar(2018)

引用 0|浏览1
暂无评分
摘要
Rooting algorithms are used to remove affixes from different words, and extract the root from which the inputted word is derived. Rooting process helps to standardize terms referring to the same concept. These algorithms are widely used in Arabic language applications, such as information retrieval systems, indexes, text mining, text classifiers, data compression, spelling checkers, text summarization, question answering systems, machine translation, part of speech tagging systems, stemmers, and morphological analyzer ...etc. Khoja’s algorithm is a standard Arabic root extraction algorithm, which has a number of flaws. The proposed algorithm extends Khoja’s algorithm and resolves most of its flaws. The testing process was conducted on Thalji’s corpus, which was mainly built to test and compare Arabic roots extraction algorithms. This corpus contains 720,000 word-root pairs from 12,000 roots. The performance of the proposed algorithm is then compared with Khoja’s algorithm, the proposed algorithm obtained higher accuracy than Khoja’s algorithm. The result shows that Khoja algorithm achieved 63%, and the presented algorithm achieved 92% accuracy of root extraction. Keywords-component; Root Extraction, stem, rules, pattern, prefix, suffix, infix. (key words)
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要