A Hybrid Approach for Building Arabic Diacritizer.

Semitic '09: Proceedings of the EACL 2009 Workshop on Computational Approaches to Semitic Languages(2009)

引用 32|浏览50
暂无评分
摘要
Modern standard Arabic is usually written without diacritics. This makes it difficult for performing Arabic text processing. Diacritization helps clarify the meaning of words and disambiguate any vague spellings or pronunciations, as some Arabic words are spelled the same but differ in meaning. In this paper, we address the issue of adding diacritics to undiacritized Arabic text using a hybrid approach. The approach requires an Arabic lexicon and large corpus of fully diacritized text for training purposes in order to detect diacritics. Case-Ending is treated as a separate post processing task using syntactic information. The hybrid approach relies on lexicon retrieval, bigram, and SVM-statistical prioritized techniques. We present results of an evaluation of the proposed diacritization approach and discuss various modifications for improving the performance of this approach.
更多
查看译文
关键词
hybrid approach,Arabic lexicon,Arabic text processing,Arabic word,modern standard Arabic,proposed diacritization approach,undiacritized Arabic text,diacritized text,lexicon retrieval,separate post processing task,Arabic diacritizer
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要