KC4MT: A High-Quality Corpus for Multilingual Machine Translation.

International Conference on Language Resources and Evaluation (LREC)(2022)

引用 0|浏览2
暂无评分
摘要
The multilingual parallel corpus is an important resource for many applications of natural language processing (NLP). For machine translation, the size and quality of the training corpus mainly affects the quality of the translation models. In this work, we present the method for building high-quality multilingual parallel corpus in the news domain and for some low-resource languages, including Vietnamese, Laos, and Khmer, to improve the quality of multilingual machine translation in these areas. We also publicized this one that includes 500:000 Vietnamese-Chinese bilingual sentence pairs; 150:000 Vietnamese-Laos bilingual sentence pairs, and 150:000 Vietnamese-Khmer bilingual sentence pairs.
更多
查看译文
关键词
Multilingual parallel corpus, low-resource languages, language resource, parallel corpus, machine translation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要