Investigation of Data Augmentation Techniques for Assamese-English Language Pair Machine Translation

2023 18th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)(2023)

引用 0|浏览0
暂无评分
摘要
Machine translation can deliver impressive results when a substantial parallel corpus is accessible. However, in a country like India, characterized by its linguistic diversity and numerous languages with distinct origins and scripts, most languages face a scarcity of resources, making it challenging to create translation models of high quality. This work investigates a neural machine translation system with data augmentation techniques to boost the translation quality for an extremely resource-constrained language pair, i.e., English-Assamese. We experiment with back-translation, tagged back-translation, iterative back-translation, and iterative tagged back-translation. A qualitative and quantitative analysis performed on the various data augmentation techniques is performed. Furthermore, human evaluation is carried out to evaluate the adequacy and fluency of the translation. Empirical results show that data augmentation via iterative back-translation methods and tagged approach enhances translation performance in extremely low-resource settings.
更多
查看译文
关键词
Neural Machine Translation,Back-translation,Iterative Back-translation,NLP,BLEU
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要