Morphology-Aware Word-Segmentation in Dialectal Arabic Adaptation of Neural Machine Translation

Ahmed Y. Tawfik, Mahitab Emam,Khaled Essam, Robert Nabil,Hany Hassan

FOURTH ARABIC NATURAL LANGUAGE PROCESSING WORKSHOP (WANLP 2019)(2019)

引用 11|浏览9
暂无评分
摘要
Parallel corpora available for building machine translation (MT) models for dialectal Arabic (DA) are rather limited. The scarcity of resources has prompted the use of Modern Standard Arabic (MSA) abundant resources to complement the limited dialectal resource. However, clitics often differ between MSA and DA. This paper compares morphology-aware DA word segmentation to other word segmentation approaches like Byte Pair Encoding (BPE) and Sub-word Regularization (SR). A set of experiments conducted on Egyptian Arabic (EA), Levantine Arabic (LA), and Gulf Arabic (GA) show that a sufficiently accurate morphology-aware segmentation used in conjunction with BPE or SR outperforms the other word segmentation approaches.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要