Joint Training for Pivot-based Neural Machine Translation

IJCAI, pp. 3974-3980, 2017.

Cited by: 51|Bibtex|Views54|Links
EI
Keywords:
Natural Language Processing: Machine Translationtranslation modelscale parallel corporaparallel corporapivot languageMore(10+)
Weibo:
As bridging corpora are used in likelihood connection term for “bridging” the source-to-pivot and pivot-to-target translation models, why do not we directly build neural machine translation systems with these corpora?

Abstract:

While recent neural machine translation approaches have delivered state-of-the-art performance for resource-rich language pairs, they suffer from the data scarcity problem for resource-scarce language pairs. Although this problem can be alleviated by exploiting a pivot language to bridge the source and target languages, the source-to-pivo...More

Code:

Data:

Introduction
  • Recent several years have witnessed the rapid development of neural machine translation (NMT) [Sutskever et al, 2014; Bahdanau et al, 2015], which advocates the use of neural networks to directly model the translation process in an endto-end way.
  • There still remains a major challenge for NMT: large-scale parallel corpora are usually non-existent for most language pairs.
  • This is unfortunate because NMT is a datahungry approach and requires a large amount of data to fully train translation models.
  • One can assume that there exist a third language called pivot with source-pivot and pivot-target parallel corpora available.
  • The source-to-target model can be decomposed into two sub-models by treating the pivot sentence as a latent variable:
Highlights
  • Recent several years have witnessed the rapid development of neural machine translation (NMT) [Sutskever et al, 2014; Bahdanau et al, 2015], which advocates the use of neural networks to directly model the translation process in an endto-end way
  • To further verify its effectiveness, we evaluate all the methods on the WMT corpus, which is much larger than Europarl
  • As bridging corpora are used in likelihood connection term for “bridging” the source-to-pivot and pivot-to-target translation models, why do not we directly build neural machine translation systems with these corpora?
  • We present joint training for pivot-based neural machine translation
Methods
  • 4.1 Setup The authors evaluated the approach on two translation tasks: Corpus Europarl WMT

    Lang. es-en de-en en-fr es-en en-fr

    # Sent. # Word Vocab. # Sent. # Word Vocab. # Sent. # Word Vocab.

    # Sent. # Word Vocab. # Sent. # Word Vocab.
  • 4.1 Setup The authors evaluated the approach on two translation tasks: Corpus Europarl WMT.
  • 1. Spanish-English-French: Spanish as the source language, English as the pivot language, and French as the target language, 2.
  • German-English-French: German as the source language, English as the pivot language, and French as the target language.
  • Table 1 shows the statistics of the Europarl and WMT corpora used in the experiments.
  • The authors remove the empty lines and retain sentence pairs with no more than 50 words.
  • To avoid the intersection of the source-pivot and pivot-target corpora, the authors split the overlapped pivot-language sentences of source-to-pivot and pivot-to-target corpora into two separate parts with equal size and merge them separately with the non-overlapping parts for each language pair
Results
  • Results on the Europarl Corpus

    Table 2 shows the comparison results between the joint training on three connection terms and independent training on the Europarl Corpus.
  • In Spanish-to-French translation task, soft connection achieves significant improvements in Spanish-to-French and Spanishto-English directions hard connection still performs comparably with independent training.
  • In German-to-French translation task, soft and hard connections achieve comparable performances with independent training.
  • As bridging corpora are used in likelihood connection term for “bridging” the source-to-pivot and pivot-to-target translation models, why do not the authors directly build NMT systems with these corpora?.
  • The authors train source-to-target models using bridging corpora and show translation results in Table 6.
  • It indicates that NMT yields poor performance on low-resource languages and the pivot-based translation strategy remedies the drawback to alleviate data scarcity effectively
Conclusion
  • The authors present joint training for pivot-based neural machine translation.
  • It is appealing to combine source and pivot sentences for decoding target sentences [Firat et al, 2016] or train a multi-source model directly [Zoph and Knight, 2016].
  • The authors plan to study better connection terms for the joint training
Summary
  • Introduction:

    Recent several years have witnessed the rapid development of neural machine translation (NMT) [Sutskever et al, 2014; Bahdanau et al, 2015], which advocates the use of neural networks to directly model the translation process in an endto-end way.
  • There still remains a major challenge for NMT: large-scale parallel corpora are usually non-existent for most language pairs.
  • This is unfortunate because NMT is a datahungry approach and requires a large amount of data to fully train translation models.
  • One can assume that there exist a third language called pivot with source-pivot and pivot-target parallel corpora available.
  • The source-to-target model can be decomposed into two sub-models by treating the pivot sentence as a latent variable:
  • Methods:

    4.1 Setup The authors evaluated the approach on two translation tasks: Corpus Europarl WMT

    Lang. es-en de-en en-fr es-en en-fr

    # Sent. # Word Vocab. # Sent. # Word Vocab. # Sent. # Word Vocab.

    # Sent. # Word Vocab. # Sent. # Word Vocab.
  • 4.1 Setup The authors evaluated the approach on two translation tasks: Corpus Europarl WMT.
  • 1. Spanish-English-French: Spanish as the source language, English as the pivot language, and French as the target language, 2.
  • German-English-French: German as the source language, English as the pivot language, and French as the target language.
  • Table 1 shows the statistics of the Europarl and WMT corpora used in the experiments.
  • The authors remove the empty lines and retain sentence pairs with no more than 50 words.
  • To avoid the intersection of the source-pivot and pivot-target corpora, the authors split the overlapped pivot-language sentences of source-to-pivot and pivot-to-target corpora into two separate parts with equal size and merge them separately with the non-overlapping parts for each language pair
  • Results:

    Results on the Europarl Corpus

    Table 2 shows the comparison results between the joint training on three connection terms and independent training on the Europarl Corpus.
  • In Spanish-to-French translation task, soft connection achieves significant improvements in Spanish-to-French and Spanishto-English directions hard connection still performs comparably with independent training.
  • In German-to-French translation task, soft and hard connections achieve comparable performances with independent training.
  • As bridging corpora are used in likelihood connection term for “bridging” the source-to-pivot and pivot-to-target translation models, why do not the authors directly build NMT systems with these corpora?.
  • The authors train source-to-target models using bridging corpora and show translation results in Table 6.
  • It indicates that NMT yields poor performance on low-resource languages and the pivot-based translation strategy remedies the drawback to alleviate data scarcity effectively
  • Conclusion:

    The authors present joint training for pivot-based neural machine translation.
  • It is appealing to combine source and pivot sentences for decoding target sentences [Firat et al, 2016] or train a multi-source model directly [Zoph and Knight, 2016].
  • The authors plan to study better connection terms for the joint training
Tables
  • Table1: Characteristics of Spanish-English, German-English and English-French datasets on the Europarl and WMT corpora. “es” denotes Spanish, “en” denotes English, “de” denotes German, and “fr” denotes French
  • Table2: Comparison between independent and joint training on Spanish-French and German-French translation tasks using the Europarl corpus. English is treated as the pivot language. The BLEU scores are case-insensitive. “*”: significantly better than independent training (p < 0.05); “**”: significantly better than independent training (p < 0.01). We use the statistical significance test with paired bootstrap resampling [<a class="ref-link" id="cKoehn_2004_a" href="#rKoehn_2004_a">Koehn, 2004</a>]
  • Table3: Examples of pivot and target translations using the pivot-based translation strategy. We observe that our approaches generate better translations for both pivot and target sentences. We italicize correct translation segments which are no short than 2-grams
  • Table4: Results on Spanish-French translation task from WMT corpus. English is treated as the pivot language. “**”: significantly better than independent training (p < 0.01)
  • Table5: Comparison with <a class="ref-link" id="cFirat_et+al_2016_a" href="#rFirat_et+al_2016_a">Firat et al [2016</a>] on Spanish-French translation task from WMT corpus. “**”: significantly better than independent training (p < 0.01)
  • Table6: Translation performance on bridging corpora
  • Table7: Effect of the data size of source-to-target parallel corpora (Bridge Corpora) used in LIKELIHOOD
Download tables as Excel
Related work
  • Our work is inspired by two lines of research: (1) machine translation with pivot languages and (2) incorporating additional data resource for NMT.

    5.1 Machine Translation with Pivot Languages

    Machine translation suffers from the scarcity of parallel corpora. For low-resource language pairs, a pivot language is introduced to “bridge” source and target languages in SMT [Cohn and Lapata, 2007; Wu and Wang, 2007; Utiyama and Isahara, 2007; Zahabi et al, 2013; El Kholy et al, 2013].

    In NMT, Firat et al [2016] and Johnson et al [2016] propose multi-way, multilingual NMT models that enable zeroresource machine translation. They also need to apply pivotbased approaches into NMT to ameliorate its performance. Zoph et al [2016] adopt transfer learning to fine-tune parameters of the low-resource language pairs using trained parameters on the high-resource language pairs. However, our approach aims to jointly train source-to-pivot and pivot-totarget NMT models, which can alleviate the error propagation of pivot-based approaches. We use connection terms to “bridge” these two models and make them benefit each other.
Funding
  • This research is supported by the 863 Program (2015AA015407), the National Natural Science Foundation of China (No 61532001, No 61522204, No 61432013), Tsinghua Initiative Research Program grants 20151080475, MOE Online Education Research Center (Quantong Fund) grant 2017ZD203, and gift funds from Huawei and Ant Financial
Reference
  • [Bahdanau et al., 2015] Dzmitry Bahdanau, KyungHyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. In Proceedings of ICLR, 2015.
    Google ScholarLocate open access versionFindings
  • [Bertoldi et al., 2008] Nicola Bertoldi, Madalina Barbaiani, Marcello Federico, and Roldano Cattoni. Phrase-based statistical machine translation with pivot languages. In IWSLT, 2008.
    Google ScholarLocate open access versionFindings
  • [Cheng et al., 2016] Yong Cheng, Wei Xu, Zhongjun He, Wei He, Hua Wu, Maosong Sun, and Yang Liu. Semisupervised learning for neural machine translation. In Proceedings of ACL, 2016.
    Google ScholarLocate open access versionFindings
  • [Cohn and Lapata, 2007] Trevor Cohn and Mirella Lapata. Machine translation by triangulation: Making effective use of multi-parallel corpora. In Proceedings of ACL, 2007.
    Google ScholarLocate open access versionFindings
  • [El Kholy et al., 2013] Ahmed El Kholy, Nizar Habash, Gregor Leusch, Evgeny Matusov, and Hassan Sawaf. Language independent connectivity strength features for phrase pivot statistical machine translation. In Proceedings of ACL, 2013.
    Google ScholarLocate open access versionFindings
  • [Firat et al., 2016] Orhan Firat, Baskaran Sankaran, Yaser Al-Onaizan, Fatos T Yarman Vural, and Kyunghyun Cho. Zero-resource translation with multi-lingual neural machine translation. In Proceedings of EMNLP, 2016.
    Google ScholarLocate open access versionFindings
  • [Gulccehre et al., 2015] Caglar Gulccehre, Orhan Firat, Kelvin Xu, Kyunghyun Cho, Loıc Barrault, Huei-Chi Lin, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. On using monolingual corpora in neural machine translation. arXiv:1503.03535 [cs.CL], 2015.
    Findings
  • [Jean et al., 2015] Sebastien Jean, Kyunghyun Cho, Roland Memisevic, and Yoshua Bengio. On using very large target vocabulary for neural machine translation. In Proceedings of ACL, 2015.
    Google ScholarLocate open access versionFindings
  • [Johnson et al., 2016] Melvin Johnson, Mike Schuster, Quoc V Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Viegas, Martin Wattenberg, Greg Corrado, et al. Google’s multilingual neural machine translation system: Enabling zero-shot translation. arXiv preprint arXiv:1611.04558, 2016.
    Findings
  • [Junczys-Dowmunt et al., 2016] Marcin Junczys-Dowmunt, Tomasz Dwojak, and Hieu Hoang. Is neural machine translation ready for deployment? a case study on 30 translation directions. arXiv:1610.01108v2, 2016.
    Findings
  • [Koehn, 2004] Philipp Koehn. Statistical significance tests for machine translation evaluation. In Proceedings of EMNLP, 2004.
    Google ScholarLocate open access versionFindings
  • [Papineni et al., 2002] Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a methof for automatic evaluation of machine translation. In Proceedings of ACL, 2002.
    Google ScholarLocate open access versionFindings
  • [Sennrich et al., 2016a] Rico Sennrich, Barry Haddow, and Alexandra Birch. Improving nerual machine translation models with monolingual data. In Proceedings of ACL, 2016.
    Google ScholarLocate open access versionFindings
  • [Sennrich et al., 2016b] Rico Sennrich, Barry Haddow, and Alexandra Birch. Neural machine translation of rare words with subword units. In Proceedings of ACL, 2016.
    Google ScholarLocate open access versionFindings
  • [Shen et al., 2016] Shiqi Shen, Yong Cheng, Zhongjun He, Wei He, Hua Wu, Maosong Sun, and Yang Liu. Minimum risk training for neural machine translation. In Proceedings of ACL, 2016.
    Google ScholarLocate open access versionFindings
  • [Sutskever et al., 2014] Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. Sequence to sequence learning with neural networks. In Proceedings of NIPS, 2014.
    Google ScholarLocate open access versionFindings
  • [Utiyama and Isahara, 2007] Masao Utiyama and Hitoshi Isahara. A comparison of pivot methods for phrase-based statistical machine translation. In HLT-NAACL, 2007.
    Google ScholarLocate open access versionFindings
  • [Wu and Wang, 2007] Hua Wu and Haifeng Wang. Pivot language approach for phrase-based statistical machine translation. Machine Translation, 2007.
    Google ScholarFindings
  • [Zahabi et al., 2013] Samira Tofighi Zahabi, Somayeh Bakhshaei, and Shahram Khadivi. Using context vectors in improving a machine translation system with bridge language. In Proceedings of ACL, 2013.
    Google ScholarLocate open access versionFindings
  • [Zhang and Zong, 2016] Jiajun Zhang and Chengqing Zong. Exploiting source-side monolingual data in neural machine translation. In Proceedings of EMNLP, 2016.
    Google ScholarLocate open access versionFindings
  • [Zoph and Knight, 2016] Barret Zoph and Kevin Knight. Multi-source neural translation. In Proceedings of NAACL, 2016.
    Google ScholarLocate open access versionFindings
  • [Zoph et al., 2016] Barret Zoph, Deniz Yuret, Jonathan May, and Kevin Knight. Transfer learning for low-resource neural machine translation. In Proceedings of EMNLP, 2016.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments