The Effectiveness of Morphology-aware Segmentation in Low-Resource Neural Machine Translation
arxiv(2021)
摘要
This paper evaluates the performance of several modern subword segmentation
methods in a low-resource neural machine translation setting. We compare
segmentations produced by applying BPE at the token or sentence level with
morphologically-based segmentations from LMVR and MORSEL. We evaluate
translation tasks between English and each of Nepali, Sinhala, and Kazakh, and
predict that using morphologically-based segmentation methods would lead to
better performance in this setting. However, comparing to BPE, we find that no
consistent and reliable differences emerge between the segmentation methods.
While morphologically-based methods outperform BPE in a few cases, what
performs best tends to vary across tasks, and the performance of segmentation
methods is often statistically indistinguishable.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要