Understanding the effects of word-level linguistic annotations in under-resourced neural machine translation
COLING(2024)
摘要
This paper studies the effects of word-level linguistic annotations in
under-resourced neural machine translation, for which there is incomplete
evidence in the literature. The study covers eight language pairs, different
training corpus sizes, two architectures, and three types of annotation: dummy
tags (with no linguistic information at all), part-of-speech tags, and
morpho-syntactic description tags, which consist of part of speech and
morphological features. These linguistic annotations are interleaved in the
input or output streams as a single tag placed before each word. In order to
measure the performance under each scenario, we use automatic evaluation
metrics and perform automatic error classification. Our experiments show that,
in general, source-language annotations are helpful and morpho-syntactic
descriptions outperform part of speech for some language pairs. On the
contrary, when words are annotated in the target language, part-of-speech tags
systematically outperform morpho-syntactic description tags in terms of
automatic evaluation metrics, even though the use of morpho-syntactic
description tags improves the grammaticality of the output. We provide a
detailed analysis of the reasons behind this result.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要