Topic Model Based Adaptation Data Selection for Domain-Specific Machine Translation.

Communications in Computer and Information Science(2016)

引用 0|浏览13
暂无评分
摘要
Current domain-specific machine translation (MT) suffers from the lack of high-quality bilingual corpora. Existing work in this field has shown the advantage of Adaptation data selection (Ada-selection) for enriching the corpora. Encouraged by the empirical finding that topic distribution is conductive to characterizing a distinctive domain, we propose to use topic model to improve Ada-selection. Based on a joint LDA approach, we incorporate topic distribution in measuring the relevance between the target domain and the candidate parallel sentence pairs. On the basis, we select the highly relevant candidates as the high-quality domain-specific bilingual corpora. In practice, we apply our method for the acquisition of domain-specific corpora from the general-domain. Experiments on an end-to-end domain-specific MT task show that our method outperforms the state of the art, yielding at least 1.5 BLEU points at different scales of training data.
更多
查看译文
关键词
Statistical machine translation,Specific-domain machine translation,Topic model,Data selection
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要