Modeling, relevance in statistical machine translation: scoring aligment, context, and annotations of translation instances

Modeling, relevance in statistical machine translation: scoring aligment, context, and annotations of translation instances(2012)

引用 23|浏览61
暂无评分
摘要
Machine translation has advanced considerably in recent years, primarily due to the availability of larger datasets. However, one cannot rely on the availability of copious, high-quality bilingual training data. In this work, we improve upon the state-of-the-art in machine translation with an instance-based model that scores each instance of translation in the corpus. A translation instance reflects a source and target correspondence at one specific location in the corpus. The significance of this approach is that our model is able to capture that some instances of translation are more relevant than others. We have implemented this approach in Cunei, a new platform for machine translation that permits the scoring of instance-specific features. Leveraging per-instance alignment features, we demonstrate that Cunei can outperform Moses, a widely-used machine translation system. We then expand on this baseline system in three principal directions, each of which shows further gains. First, we score the source context of a translation instance in order to favor those that are most similar to the input sentence. Second, we apply similar techniques to score the target context of a translation instance and favor those that are most similar to the target hypothesis. Third, we provide a mechanism to mark-up the corpus with annotations (e.g. statistical word clustering, part-of-speech labels, and parse trees) and then exploit this information to create additional per-instance similarity features. Each of these techniques explicitly takes advantage of the fact that our approach scores each instance of translation on demand after the input sentence is provided and while the target hypothesis is being generated; similar extensions would be impossible or quite difficult in existing machine translation systems. Ultimately, this approach provides a more flexible framework for integration of novel features that adapts better to new data. In our experiments with German-English and Czech-English translation, the addition of instance-specific features consistently shows improvement.
更多
查看译文
关键词
statistical machine translation,similar extension,input sentence,instance-specific feature,target hypothesis,Czech-English translation,machine translation system,machine translation,approach score,widely-used machine translation system,scoring aligment,translation instance
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要