Fine-Tuned Machine Translation Metrics Struggle in Unseen Domains
CoRR(2024)
摘要
We introduce a new, extensive multidimensional quality metrics (MQM)
annotated dataset covering 11 language pairs in the biomedical domain. We use
this dataset to investigate whether machine translation (MT) metrics which are
fine-tuned on human-generated MT quality judgements are robust to domain shifts
between training and inference. We find that fine-tuned metrics exhibit a
substantial performance drop in the unseen domain scenario relative to metrics
that rely on the surface form, as well as pre-trained metrics which are not
fine-tuned on MT quality judgments.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要