Joint Training for Pivot-based Neural Machine Translation
IJCAI, pp. 3974-3980, 2017.
EI
Keywords:
Natural Language Processing: Machine Translationtranslation modelscale parallel corporaparallel corporapivot languageMore(10+)
Weibo:
Abstract:
While recent neural machine translation approaches have delivered state-of-the-art performance for resource-rich language pairs, they suffer from the data scarcity problem for resource-scarce language pairs. Although this problem can be alleviated by exploiting a pivot language to bridge the source and target languages, the source-to-pivo...More
Code:
Data:
Introduction
- Recent several years have witnessed the rapid development of neural machine translation (NMT) [Sutskever et al, 2014; Bahdanau et al, 2015], which advocates the use of neural networks to directly model the translation process in an endto-end way.
- There still remains a major challenge for NMT: large-scale parallel corpora are usually non-existent for most language pairs.
- This is unfortunate because NMT is a datahungry approach and requires a large amount of data to fully train translation models.
- One can assume that there exist a third language called pivot with source-pivot and pivot-target parallel corpora available.
- The source-to-target model can be decomposed into two sub-models by treating the pivot sentence as a latent variable:
Highlights
- Recent several years have witnessed the rapid development of neural machine translation (NMT) [Sutskever et al, 2014; Bahdanau et al, 2015], which advocates the use of neural networks to directly model the translation process in an endto-end way
- To further verify its effectiveness, we evaluate all the methods on the WMT corpus, which is much larger than Europarl
- As bridging corpora are used in likelihood connection term for “bridging” the source-to-pivot and pivot-to-target translation models, why do not we directly build neural machine translation systems with these corpora?
- We present joint training for pivot-based neural machine translation
Methods
- 4.1 Setup The authors evaluated the approach on two translation tasks: Corpus Europarl WMT
Lang. es-en de-en en-fr es-en en-fr
# Sent. # Word Vocab. # Sent. # Word Vocab. # Sent. # Word Vocab.
# Sent. # Word Vocab. # Sent. # Word Vocab. - 4.1 Setup The authors evaluated the approach on two translation tasks: Corpus Europarl WMT.
- 1. Spanish-English-French: Spanish as the source language, English as the pivot language, and French as the target language, 2.
- German-English-French: German as the source language, English as the pivot language, and French as the target language.
- Table 1 shows the statistics of the Europarl and WMT corpora used in the experiments.
- The authors remove the empty lines and retain sentence pairs with no more than 50 words.
- To avoid the intersection of the source-pivot and pivot-target corpora, the authors split the overlapped pivot-language sentences of source-to-pivot and pivot-to-target corpora into two separate parts with equal size and merge them separately with the non-overlapping parts for each language pair
Results
- Results on the Europarl Corpus
Table 2 shows the comparison results between the joint training on three connection terms and independent training on the Europarl Corpus. - In Spanish-to-French translation task, soft connection achieves significant improvements in Spanish-to-French and Spanishto-English directions hard connection still performs comparably with independent training.
- In German-to-French translation task, soft and hard connections achieve comparable performances with independent training.
- As bridging corpora are used in likelihood connection term for “bridging” the source-to-pivot and pivot-to-target translation models, why do not the authors directly build NMT systems with these corpora?.
- The authors train source-to-target models using bridging corpora and show translation results in Table 6.
- It indicates that NMT yields poor performance on low-resource languages and the pivot-based translation strategy remedies the drawback to alleviate data scarcity effectively
Conclusion
- The authors present joint training for pivot-based neural machine translation.
- It is appealing to combine source and pivot sentences for decoding target sentences [Firat et al, 2016] or train a multi-source model directly [Zoph and Knight, 2016].
- The authors plan to study better connection terms for the joint training
Summary
Introduction:
Recent several years have witnessed the rapid development of neural machine translation (NMT) [Sutskever et al, 2014; Bahdanau et al, 2015], which advocates the use of neural networks to directly model the translation process in an endto-end way.- There still remains a major challenge for NMT: large-scale parallel corpora are usually non-existent for most language pairs.
- This is unfortunate because NMT is a datahungry approach and requires a large amount of data to fully train translation models.
- One can assume that there exist a third language called pivot with source-pivot and pivot-target parallel corpora available.
- The source-to-target model can be decomposed into two sub-models by treating the pivot sentence as a latent variable:
Methods:
4.1 Setup The authors evaluated the approach on two translation tasks: Corpus Europarl WMT
Lang. es-en de-en en-fr es-en en-fr
# Sent. # Word Vocab. # Sent. # Word Vocab. # Sent. # Word Vocab.
# Sent. # Word Vocab. # Sent. # Word Vocab.- 4.1 Setup The authors evaluated the approach on two translation tasks: Corpus Europarl WMT.
- 1. Spanish-English-French: Spanish as the source language, English as the pivot language, and French as the target language, 2.
- German-English-French: German as the source language, English as the pivot language, and French as the target language.
- Table 1 shows the statistics of the Europarl and WMT corpora used in the experiments.
- The authors remove the empty lines and retain sentence pairs with no more than 50 words.
- To avoid the intersection of the source-pivot and pivot-target corpora, the authors split the overlapped pivot-language sentences of source-to-pivot and pivot-to-target corpora into two separate parts with equal size and merge them separately with the non-overlapping parts for each language pair
Results:
Results on the Europarl Corpus
Table 2 shows the comparison results between the joint training on three connection terms and independent training on the Europarl Corpus.- In Spanish-to-French translation task, soft connection achieves significant improvements in Spanish-to-French and Spanishto-English directions hard connection still performs comparably with independent training.
- In German-to-French translation task, soft and hard connections achieve comparable performances with independent training.
- As bridging corpora are used in likelihood connection term for “bridging” the source-to-pivot and pivot-to-target translation models, why do not the authors directly build NMT systems with these corpora?.
- The authors train source-to-target models using bridging corpora and show translation results in Table 6.
- It indicates that NMT yields poor performance on low-resource languages and the pivot-based translation strategy remedies the drawback to alleviate data scarcity effectively
Conclusion:
The authors present joint training for pivot-based neural machine translation.- It is appealing to combine source and pivot sentences for decoding target sentences [Firat et al, 2016] or train a multi-source model directly [Zoph and Knight, 2016].
- The authors plan to study better connection terms for the joint training
Tables
- Table1: Characteristics of Spanish-English, German-English and English-French datasets on the Europarl and WMT corpora. “es” denotes Spanish, “en” denotes English, “de” denotes German, and “fr” denotes French
- Table2: Comparison between independent and joint training on Spanish-French and German-French translation tasks using the Europarl corpus. English is treated as the pivot language. The BLEU scores are case-insensitive. “*”: significantly better than independent training (p < 0.05); “**”: significantly better than independent training (p < 0.01). We use the statistical significance test with paired bootstrap resampling [<a class="ref-link" id="cKoehn_2004_a" href="#rKoehn_2004_a">Koehn, 2004</a>]
- Table3: Examples of pivot and target translations using the pivot-based translation strategy. We observe that our approaches generate better translations for both pivot and target sentences. We italicize correct translation segments which are no short than 2-grams
- Table4: Results on Spanish-French translation task from WMT corpus. English is treated as the pivot language. “**”: significantly better than independent training (p < 0.01)
- Table5: Comparison with <a class="ref-link" id="cFirat_et+al_2016_a" href="#rFirat_et+al_2016_a">Firat et al [2016</a>] on Spanish-French translation task from WMT corpus. “**”: significantly better than independent training (p < 0.01)
- Table6: Translation performance on bridging corpora
- Table7: Effect of the data size of source-to-target parallel corpora (Bridge Corpora) used in LIKELIHOOD
Related work
- Our work is inspired by two lines of research: (1) machine translation with pivot languages and (2) incorporating additional data resource for NMT.
5.1 Machine Translation with Pivot Languages
Machine translation suffers from the scarcity of parallel corpora. For low-resource language pairs, a pivot language is introduced to “bridge” source and target languages in SMT [Cohn and Lapata, 2007; Wu and Wang, 2007; Utiyama and Isahara, 2007; Zahabi et al, 2013; El Kholy et al, 2013].
In NMT, Firat et al [2016] and Johnson et al [2016] propose multi-way, multilingual NMT models that enable zeroresource machine translation. They also need to apply pivotbased approaches into NMT to ameliorate its performance. Zoph et al [2016] adopt transfer learning to fine-tune parameters of the low-resource language pairs using trained parameters on the high-resource language pairs. However, our approach aims to jointly train source-to-pivot and pivot-totarget NMT models, which can alleviate the error propagation of pivot-based approaches. We use connection terms to “bridge” these two models and make them benefit each other.
Funding
- This research is supported by the 863 Program (2015AA015407), the National Natural Science Foundation of China (No 61532001, No 61522204, No 61432013), Tsinghua Initiative Research Program grants 20151080475, MOE Online Education Research Center (Quantong Fund) grant 2017ZD203, and gift funds from Huawei and Ant Financial
Reference
- [Bahdanau et al., 2015] Dzmitry Bahdanau, KyungHyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. In Proceedings of ICLR, 2015.
- [Bertoldi et al., 2008] Nicola Bertoldi, Madalina Barbaiani, Marcello Federico, and Roldano Cattoni. Phrase-based statistical machine translation with pivot languages. In IWSLT, 2008.
- [Cheng et al., 2016] Yong Cheng, Wei Xu, Zhongjun He, Wei He, Hua Wu, Maosong Sun, and Yang Liu. Semisupervised learning for neural machine translation. In Proceedings of ACL, 2016.
- [Cohn and Lapata, 2007] Trevor Cohn and Mirella Lapata. Machine translation by triangulation: Making effective use of multi-parallel corpora. In Proceedings of ACL, 2007.
- [El Kholy et al., 2013] Ahmed El Kholy, Nizar Habash, Gregor Leusch, Evgeny Matusov, and Hassan Sawaf. Language independent connectivity strength features for phrase pivot statistical machine translation. In Proceedings of ACL, 2013.
- [Firat et al., 2016] Orhan Firat, Baskaran Sankaran, Yaser Al-Onaizan, Fatos T Yarman Vural, and Kyunghyun Cho. Zero-resource translation with multi-lingual neural machine translation. In Proceedings of EMNLP, 2016.
- [Gulccehre et al., 2015] Caglar Gulccehre, Orhan Firat, Kelvin Xu, Kyunghyun Cho, Loıc Barrault, Huei-Chi Lin, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. On using monolingual corpora in neural machine translation. arXiv:1503.03535 [cs.CL], 2015.
- [Jean et al., 2015] Sebastien Jean, Kyunghyun Cho, Roland Memisevic, and Yoshua Bengio. On using very large target vocabulary for neural machine translation. In Proceedings of ACL, 2015.
- [Johnson et al., 2016] Melvin Johnson, Mike Schuster, Quoc V Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Viegas, Martin Wattenberg, Greg Corrado, et al. Google’s multilingual neural machine translation system: Enabling zero-shot translation. arXiv preprint arXiv:1611.04558, 2016.
- [Junczys-Dowmunt et al., 2016] Marcin Junczys-Dowmunt, Tomasz Dwojak, and Hieu Hoang. Is neural machine translation ready for deployment? a case study on 30 translation directions. arXiv:1610.01108v2, 2016.
- [Koehn, 2004] Philipp Koehn. Statistical significance tests for machine translation evaluation. In Proceedings of EMNLP, 2004.
- [Papineni et al., 2002] Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a methof for automatic evaluation of machine translation. In Proceedings of ACL, 2002.
- [Sennrich et al., 2016a] Rico Sennrich, Barry Haddow, and Alexandra Birch. Improving nerual machine translation models with monolingual data. In Proceedings of ACL, 2016.
- [Sennrich et al., 2016b] Rico Sennrich, Barry Haddow, and Alexandra Birch. Neural machine translation of rare words with subword units. In Proceedings of ACL, 2016.
- [Shen et al., 2016] Shiqi Shen, Yong Cheng, Zhongjun He, Wei He, Hua Wu, Maosong Sun, and Yang Liu. Minimum risk training for neural machine translation. In Proceedings of ACL, 2016.
- [Sutskever et al., 2014] Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. Sequence to sequence learning with neural networks. In Proceedings of NIPS, 2014.
- [Utiyama and Isahara, 2007] Masao Utiyama and Hitoshi Isahara. A comparison of pivot methods for phrase-based statistical machine translation. In HLT-NAACL, 2007.
- [Wu and Wang, 2007] Hua Wu and Haifeng Wang. Pivot language approach for phrase-based statistical machine translation. Machine Translation, 2007.
- [Zahabi et al., 2013] Samira Tofighi Zahabi, Somayeh Bakhshaei, and Shahram Khadivi. Using context vectors in improving a machine translation system with bridge language. In Proceedings of ACL, 2013.
- [Zhang and Zong, 2016] Jiajun Zhang and Chengqing Zong. Exploiting source-side monolingual data in neural machine translation. In Proceedings of EMNLP, 2016.
- [Zoph and Knight, 2016] Barret Zoph and Kevin Knight. Multi-source neural translation. In Proceedings of NAACL, 2016.
- [Zoph et al., 2016] Barret Zoph, Deniz Yuret, Jonathan May, and Kevin Knight. Transfer learning for low-resource neural machine translation. In Proceedings of EMNLP, 2016.
Tags
Comments