Minimum Bayes-Risk Phrase Table Pruning For Pivot-Based Machine Translation In Internet Of Things

IEEE ACCESS(2018)

引用 4|浏览45
暂无评分
摘要
Machine translation, which will be used widely in human-computer interaction services to Internet of Things (IoT), is a key technology in artificial intelligence field. This paper presents a minimum Bayes-risk (MBR) phrase table pruning method for pivot-based statistical machine translation (SMT). The SMT system requires a great amount of bilingual data to build a high-performance translation model. For some language pairs, such as Chinese-English, massive bilingual data are available on the web. However, for most language pairs, large-scale bilingual data are hard to obtain. Pivot-based SMT is proposed to solve the data scarcity problem: it introduces a pivot language to bridge the source language and the target language. Therefore, a source-target translation model based on well-trained source-pivot and pivot-target translation models can be derived with the pivot-based approach. However, due to the ambiguities of the pivot language, source and target phrases with different meanings may be wrongly matched. Consequently, the derived source-target phrase table may contain incorrect phrase pairs. To alleviate this problem, we apply the MBR method to prune the phrase table. The MBR pruning method removes the phrase pairs with the lowest risk from the phrase table. Experimental results on Europarl data show that the proposed method can both reduce the size of phrase tables and improve the performance of translations. This study also gives a useful reference to many IoT research field and smart web services.
更多
查看译文
关键词
Internet of Things, smart services, minimum Bayes risk, pivot-based SMT, phrase table pruning, statistical machine translation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要