Exploiting multiple correlated modalities can enhance low-resource machine translation quality

Multimedia Tools and Applications(2024)

引用 0|浏览1
暂无评分
摘要
In an effort to enhance the machine translation (MT) quality of low-resource languages, we report the first study on multimodal machine translation (MMT) for Manipuri → English, Manipuri → Hindi and Manipuri → German language pairs. Manipuri is a morphologically rich and resource-constrained language with limited resources that can be computationally utilized. No such MMT dataset has not been reported for these language pairs till date. To build the parallel datasets, we collected news articles containing images and associated text in English from a local daily newspaper and used English as a pivot language. The machine-translated outputs of the existing translation systems of these languages go through manual post-editing to build the datasets. In addition to text, we build MT systems by exploiting features from images and audio recordings in the source language, i.e., Manipuri. We carried out an extensive analysis of the MT systems trained with text-only and multimodal inputs using automatic metrics and human evaluation techniques. Our findings attest that integrating multiple correlated modalities enhances the MT system performance in low-resource settings achieving a significant improvement of up to +3 BLEU score. The human assessment revealed that the fluency score of the MMT systems depends on the type of correlated auxiliary modality.
更多
查看译文
关键词
Multimodal machine translation,Image-guided,Speech-guided,Low-resource,Manipuri,Hindi,Neural machine translation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要