Multi-Task Learning for Multiple Language Translation

International Workshop on the ACL2 Theorem Prover and Its Applications, 2015.

Cited by: 251|Bibtex|Views75|Links
EI
Keywords:
multi task learning modeltranslation modelfr multi frlanguage translationdifferent language pairMore(13+)
Weibo:
We investigate the problem of how to translate one source language into several different target languages within a unified translation model

Abstract:

In this paper, we investigate the problem of learning a machine translation model that can simultaneously translate sentences from one source language to multiple target languages. Our solution is inspired by the recently proposed neural machine translation model which generalizes machine translation as a sequence learning problem. We ext...More

Code:

Data:

0
Introduction
  • Translation from one source language to multiple target languages at the same time is a difficult task for humans.
  • To conquer the problems described above, the authors propose a multi-task learning framework based on a sequence learning model to conduct machine translation from one source language to multiple target languages, inspired by the recently proposed neural machine translation(NMT) framework proposed by Bahdanau et al (2014).
  • The authors extend the recurrent neural network based encoder-decoder framework to a multi-task learning model that shares an encoder across all language pairs and utilize a different decoder for each target language
Highlights
  • Translation from one source language to multiple target languages at the same time is a difficult task for humans
  • To conquer the problems described above, we propose a multi-task learning framework based on a sequence learning model to conduct machine translation from one source language to multiple target languages, inspired by the recently proposed neural machine translation(NMT) framework proposed by Bahdanau et al (2014)
  • Given large-scale training corpora for different language pairs, we show that our framework can improve translation quality on each target language as compared with the neural translation model trained on a single language pair
  • We use 30k of the most frequent words for source language vocabulary which is shared across different language pairs and 30k most frequent words for each target language
  • We notice that even though Dutch is from Germanic languages, it is possible to increase translation performance under our multi-task learning framework which demonstrates the generalization of our model to multiple target languages
  • We investigate the problem of how to translate one source language into several different target languages within a unified translation model
Methods
  • The goal of the first experiment is to show that multi-task learning helps to improve translation performance given enough training corpora for all language pairs.
  • The authors show that for some resource-poor language pairs with a few parallel training data, their translation performance could be improved as well.
  • The authors use 30k of the most frequent words for source language vocabulary which is shared across different language pairs and 30k most frequent words for each target language.
  • The size of training corpus in experiment 1 and 2 is listed in Table 1 where
Results
  • The authors train multi-task learning model jointly on all four parallel corpora and compare BLEU scores with models trained separately on each parallel corpora.
  • Table 4 shows that given only 15% of parallel training corpus of English-Dutch and English-Portuguese, it is possible to improve translation performance on all the target languages as well
  • This result makes sense because the correlated languages benefit from each other by sharing the same predictive structure, e.g. French, Spanish and Portuguese, all of which are from Latin.
  • The authors notice that even though Dutch is from Germanic languages, it is possible to increase translation performance under the multi-task learning framework which demonstrates the generalization of the model to multiple target languages
Conclusion
  • The authors investigate the problem of how to translate one source language into several different target languages within a unified translation model.
  • Pour leur part, assurent que le cours est l’ un des plus interessants.
  • Entre-temps, disent entendu l’ une des plus interessantes.
  • En attendant, disent qu’ il est l’ un des sujets les plus interessants.
  • They limited the right of individuals and groups to provide assistance to voters wishing to register
Summary
  • Introduction:

    Translation from one source language to multiple target languages at the same time is a difficult task for humans.
  • To conquer the problems described above, the authors propose a multi-task learning framework based on a sequence learning model to conduct machine translation from one source language to multiple target languages, inspired by the recently proposed neural machine translation(NMT) framework proposed by Bahdanau et al (2014).
  • The authors extend the recurrent neural network based encoder-decoder framework to a multi-task learning model that shares an encoder across all language pairs and utilize a different decoder for each target language
  • Methods:

    The goal of the first experiment is to show that multi-task learning helps to improve translation performance given enough training corpora for all language pairs.
  • The authors show that for some resource-poor language pairs with a few parallel training data, their translation performance could be improved as well.
  • The authors use 30k of the most frequent words for source language vocabulary which is shared across different language pairs and 30k most frequent words for each target language.
  • The size of training corpus in experiment 1 and 2 is listed in Table 1 where
  • Results:

    The authors train multi-task learning model jointly on all four parallel corpora and compare BLEU scores with models trained separately on each parallel corpora.
  • Table 4 shows that given only 15% of parallel training corpus of English-Dutch and English-Portuguese, it is possible to improve translation performance on all the target languages as well
  • This result makes sense because the correlated languages benefit from each other by sharing the same predictive structure, e.g. French, Spanish and Portuguese, all of which are from Latin.
  • The authors notice that even though Dutch is from Germanic languages, it is possible to increase translation performance under the multi-task learning framework which demonstrates the generalization of the model to multiple target languages
  • Conclusion:

    The authors investigate the problem of how to translate one source language into several different target languages within a unified translation model.
  • Pour leur part, assurent que le cours est l’ un des plus interessants.
  • Entre-temps, disent entendu l’ une des plus interessantes.
  • En attendant, disent qu’ il est l’ un des sujets les plus interessants.
  • They limited the right of individuals and groups to provide assistance to voters wishing to register
Tables
  • Table1: Size of training corpus for different language pairs
  • Table2: Size of test set in EuroParl Common testset and WMT2013 sets as our test data. One is the EuroParl Common test set2 in European Parliament Corpus, the other is WMT 2013 data set3. For WMT 2013, only En-Fr, En-Es are available and we evaluate the translation performance only on these two test sets. Information of test sets is shown in Table 2
  • Table3: Multi-task neural translation v.s. single model given large-scale corpus in all language pairs
  • Table4: Multi-task neural translation v.s. single model with a small-scale training corpus on some language pairs. * means that the language pair is sub-sampled
  • Table5: Multi-task NMT v.s. single model v.s. moses on the WMT 2013 test set
  • Table6: Source language nearest-neighbor comparison between the multi-task model and the single model
  • Table7: Translation of different target languages given the same input in our multi-task model
Download tables as Excel
Related work
  • Statistical machine translation systems often rely on large-scale parallel and monolingual training corpora to generate translations of high quality. Unfortunately, statistical machine translation system often suffers from data sparsity problem due to the fact that phrase tables are extracted from the limited bilingual corpus. Much work has been done to address the data sparsity problem such as the pivot language approach (Wu and Wang, 2007; Cohn and Lapata, 2007) and deep learning techniques (Devlin et al, 2014; Gao et al, 2014; Sundermeyer et al, 2014; Liu et al, 2014).

    On the problem of how to translate one source language to many target languages within one model, few work has been done in statistical machine translation. A related work in SMT is the pivot language approach for statistical machine translation which uses a commonly used language as a ”bridge” to generate source-target translation for language pair with few training corpus. Pivot based statistical machine translation is crucial in machine translation for resource-poor language pairs, such as Spanish to Chinese. Considering the problem of translating one source language to many target languages, pivot based SMT approaches does work well given a large-scale source language to pivot language bilingual corpus and large-scale pivot language to target languages corpus. However, in reality, language pairs between English and many other target languages may not be large enough, and pivot-based SMT sometimes fails to handle this problem. Our approach handles one to many target language translation in a different way that we directly learn an end to multi-end translation system that does not need a pivot language based on the idea of neural machine translation.
Funding
  • This paper is supported by the 973 program No 2014CB340505
Reference
  • Rie Kubota Ando and Tong Zhang. 2005. A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research, 6:1817–1853.
    Google ScholarLocate open access versionFindings
  • Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. CoRR, abs/1409.0473.
    Findings
  • Frederic Bastien, Pascal Lamblin, Razvan Pascanu, James Bergstra, Ian J. Goodfellow, Arnaud Bergeron, Nicolas Bouchard, David Warde-Farley, and Yoshua Bengio. 2012. Theano: new features and speed improvements. CoRR, abs/1211.5590.
    Findings
  • Leon Bottou. 1991. Stochastic gradient learning in neural networks. In Proceedings of Neuro-Nımes 91, Nimes, France. EC2.
    Google ScholarLocate open access versionFindings
  • KyungHyun Cho, Bart van Merrienboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. On the properties of neural machine translation: Encoderdecoder approaches. CoRR, abs/1409.1259.
    Findings
  • Trevor Cohn and Mirella Lapata. 2007. Machine translation by triangulation: Making effective use of multi-parallel corpora. In Proc. ACL, pages 728– 735.
    Google ScholarLocate open access versionFindings
  • Ronan Collobert, Jason Weston, Leon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel P. Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12:2493–2537.
    Google ScholarLocate open access versionFindings
  • Lei Cui, Xilun Chen, Dongdong Zhang, Shujie Liu, Mu Li, and Ming Zhou. 2013. Multi-domain adaptation for SMT using multi-task learning. In Proc. EMNLP, pages 1055–1065.
    Google ScholarLocate open access versionFindings
  • Jacob Devlin, Rabih Zbib, Zhongqiang Huang, Thomas Lamar, Richard M. Schwartz, and John Makhoul. 2014. Fast and robust neural network joint models for statistical machine translation. In Proc. ACL, pages 1370–1380.
    Google ScholarLocate open access versionFindings
  • Jianfeng Gao, Xiaodong He, Wen-tau Yih, and Li Deng. 2014. Learning continuous phrase representations for translation modeling. In Proc. ACL, pages 699–709.
    Google ScholarLocate open access versionFindings
  • Jun Hatori, Takuya Matsuzaki, Yusuke Miyao, and Jun’ichi Tsujii. 2012. Incremental joint approach to word segmentation, POS tagging, and dependency parsing in chinese. In Proc. ACL, pages 1045–1053.
    Google ScholarLocate open access versionFindings
  • Nal Kalchbrenner and Phil Blunsom. 2013. Recurrent continuous translation models. In Proc. EMNLP, pages 1700–1709.
    Google ScholarLocate open access versionFindings
  • Philipp Koehn. 2004. Pharaoh: A beam search decoder for phrase-based statistical machine translation models. In Machine Translation: From Real Users to Research, 6th Conference of the Association for Machine Translation in the Americas, AMTA 2004, Washington, DC, USA, September 28-October 2, 2004, Proceedings, pages 115–124.
    Google ScholarLocate open access versionFindings
  • Zhenghua Li, Min Zhang, Wanxiang Che, Ting Liu, and Wenliang Chen. 20Joint optimization for chinese POS tagging and dependency parsing. IEEE/ACM Transactions on Audio, Speech & Language Processing, 22(1):274–286.
    Google ScholarLocate open access versionFindings
  • Shujie Liu, Nan Yang, Mu Li, and Ming Zhou. 2014. A recursive recurrent neural network for statistical machine translation. In Proc. ACL, pages 1491– 1500.
    Google ScholarLocate open access versionFindings
  • Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. Bleu: A method for automatic evaluation of machine translation. In Proc. ACL, ACL 2002, pages 311–318, Stroudsburg, PA, USA. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Rico Sennrich, Holger Schwenk, and Walid Aransa. 2013. A multi-domain translation model framework for statistical machine translation. In Proc. ACL, pages 832–840.
    Google ScholarLocate open access versionFindings
  • Martin Sundermeyer, Tamer Alkhouli, Joern Wuebker, and Hermann Ney. 2014. Translation modeling with bidirectional recurrent neural networks. In Proc. EMNLP, pages 14–25.
    Google ScholarLocate open access versionFindings
  • Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, pages 3104–3112.
    Google ScholarLocate open access versionFindings
  • Hua Wu and Haifeng Wang. 2007. Pivot language approach for phrase-based statistical machine translation. In Proc. ACL, pages 165–181.
    Google ScholarLocate open access versionFindings
  • Matthew D. Zeiler. 2012. ADADELTA: an adaptive learning rate method. CoRR, abs/1212.5701.
    Findings
Your rating :
0

 

Tags
Comments