THUMT: An Open Source Toolkit for Neural Machine Translation

arXiv: Computation and Language, Volume abs/1706.06415, 2017.

Cited by: 43|Bibtex|Views79|Links
EI
Keywords:
processing grouplayer-wise relevance propagationopen source toolkitstandard attentiondecoder frameworkMore(13+)
Weibo:
While minimum risk training proves to improve over standard maximum likelihood estimation substantially, semi-supervised training is capable of exploiting monolingual corpora to improve lowresource translation

Abstract:

This paper introduces THUMT, an open-source toolkit for neural machine translation (NMT) developed by the Natural Language Processing Group at Tsinghua University. THUMT implements the standard attention-based encoder-decoder framework on top of Theano and supports three training criteria: maximum likelihood estimation, minimum risk train...More

Code:

Data:

0
Introduction
  • End-to-end neural machine translation (NMT) (Sutskever et al, 2014; Bahdanau et al, 2015) has gained increasing popularity in the machine translation community.
  • On top of Theano (Bergstra et al, 2010), THUMT implements the standard attention-based encoder-decoder framework for NMT (Bahdanau et al, 2015).
  • It sup-.
  • The authors compare THUMT with the state-of-the-art opensource toolkit GroundHog (Bahdanau et al, 2015) and achieve significant improvements on ChineseEnglish translation tasks by introducing new training criteria and optimizers
Highlights
  • End-to-end neural machine translation (NMT) (Sutskever et al, 2014; Bahdanau et al, 2015) has gained increasing popularity in the machine translation community
  • This paper introduces THUMT, an open-source toolkit developed by the Tsinghua Natural Language Processing Group
  • To facilitate the analysis of the translation process in neural machine translation, THUMT provides a visualization tool that calculates the relevance between hidden layers of neural networks and contextual words
  • Experimental results show that the translation performance of THUMT is comparable to GROUNDHOG using Maximum likelihood estimation
  • Due to the capability to include evaluation metrics in during, Minimum Risk Training obtain significant improvements over Maximum likelihood estimation. 1 Another finding is that Adam leads to consistent and significant improvements over AdaDelta
  • While minimum risk training proves to improve over standard maximum likelihood estimation substantially, semi-supervised training is capable of exploiting monolingual corpora to improve lowresource translation
Methods
  • Criterion MLE MRT SST

    AdaDelta Adam Adam Iteration Time 3.1 Setup

    The authors evaluate THUMT on the Chinese-English translation task.
  • AdaDelta Adam Adam Iteration Time 3.1 Setup.
  • The authors evaluate THUMT on the Chinese-English translation task.
  • The training set contains 1.25M sentence pairs with 27.9M Chinese words and 34.5M English words.
  • For SST, the Chinese monolingual corpus contains 18.75M sentences with 451.94M words.
  • The English corpus contains 22.32M sentences with 399.83M words.
  • The vocabulary sizes of Chinese and English are 0.97M and 1.34M, respectively.
  • The evaluation metric is case-insensitive BLEU (Papineni et al, 2002) score.
  • The authors' baseline system is GroundHog (Bahdanau et al, 2015), a state-of-the-art open-source NMT toolkit
Results
  • Table 1 shows the BLEU scores obtained by GroundHog and THUMT using different training criteria and optimizers.
  • Experimental results show that the translation performance of THUMT is comparable to GROUNDHOG using MLE.
  • Due to the capability to include evaluation metrics in during, MRT obtain significant improvements over MLE.
  • 1 Another finding is that Adam leads to consistent and significant improvements over AdaDelta.
  • Table 3 shows that replacing unknown words leads to consistent improvements for all training criteria and optimizers
Conclusion
  • The authors have introduced a new open source toolkit for NMT that supports two new training criteria: minimum risk training (Shen et al, 2016) and semisupervised training (Cheng et al, 2016).
  • While minimum risk training proves to improve over standard maximum likelihood estimation substantially, semi-supervised training is capable of exploiting monolingual corpora to improve lowresource translation.
  • The toolkit features a visualization tool for analyzing the translation process of THUMT.
  • The toolkit is freely available at http://thumt.thunlp.org
Summary
  • Introduction:

    End-to-end neural machine translation (NMT) (Sutskever et al, 2014; Bahdanau et al, 2015) has gained increasing popularity in the machine translation community.
  • On top of Theano (Bergstra et al, 2010), THUMT implements the standard attention-based encoder-decoder framework for NMT (Bahdanau et al, 2015).
  • It sup-.
  • The authors compare THUMT with the state-of-the-art opensource toolkit GroundHog (Bahdanau et al, 2015) and achieve significant improvements on ChineseEnglish translation tasks by introducing new training criteria and optimizers
  • Methods:

    Criterion MLE MRT SST

    AdaDelta Adam Adam Iteration Time 3.1 Setup

    The authors evaluate THUMT on the Chinese-English translation task.
  • AdaDelta Adam Adam Iteration Time 3.1 Setup.
  • The authors evaluate THUMT on the Chinese-English translation task.
  • The training set contains 1.25M sentence pairs with 27.9M Chinese words and 34.5M English words.
  • For SST, the Chinese monolingual corpus contains 18.75M sentences with 451.94M words.
  • The English corpus contains 22.32M sentences with 399.83M words.
  • The vocabulary sizes of Chinese and English are 0.97M and 1.34M, respectively.
  • The evaluation metric is case-insensitive BLEU (Papineni et al, 2002) score.
  • The authors' baseline system is GroundHog (Bahdanau et al, 2015), a state-of-the-art open-source NMT toolkit
  • Results:

    Table 1 shows the BLEU scores obtained by GroundHog and THUMT using different training criteria and optimizers.
  • Experimental results show that the translation performance of THUMT is comparable to GROUNDHOG using MLE.
  • Due to the capability to include evaluation metrics in during, MRT obtain significant improvements over MLE.
  • 1 Another finding is that Adam leads to consistent and significant improvements over AdaDelta.
  • Table 3 shows that replacing unknown words leads to consistent improvements for all training criteria and optimizers
  • Conclusion:

    The authors have introduced a new open source toolkit for NMT that supports two new training criteria: minimum risk training (Shen et al, 2016) and semisupervised training (Cheng et al, 2016).
  • While minimum risk training proves to improve over standard maximum likelihood estimation substantially, semi-supervised training is capable of exploiting monolingual corpora to improve lowresource translation.
  • The toolkit features a visualization tool for analyzing the translation process of THUMT.
  • The toolkit is freely available at http://thumt.thunlp.org
Tables
  • Table1: Comparison between GroundHog and THUMT
  • Table2: Comparison between MLE and SST
  • Table3: Effect of replacing unknown words
  • Table4: Comparison of training time between MLE, MRT, and SST
Download tables as Excel
Funding
  • This research is supported by the National Natural Science Foundation of China (No 61522204)
Reference
  • Sebastian Bach, Alexander Binder, Gregoire Montavon, Frederick Klauschen, Klaus-Robert Muller, and Wojciech Samek. 2015. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE.
    Google ScholarFindings
  • Dzmitry Bahdanau, KyungHyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of ICLR.
    Google ScholarLocate open access versionFindings
  • James Bergstra, Olivier Breuleux, Frederic Bastien, Pascal Lamblin, Razvan Pascanu, Guillaume Desjardins, Joseph Turian, David Warde-Farley, and Yoshua Bengio. 2010. Theano: A cpu and gpu math compiler in python. In Proc. 9th Python in Science Conf.
    Google ScholarLocate open access versionFindings
  • Yong Cheng, Shiqi Shen, Zhongjun He, Wei He, Hua Wu, Maosong Sun, and Yang Liu. 2016. Agreement-based learning of parallel lexicons and phrases from non-parallel corpora. In Proceedings of IJCAI.
    Google ScholarLocate open access versionFindings
  • Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using rnn encoder–decoder for statistical machine translation. In Proceedings of EMNLP.
    Google ScholarLocate open access versionFindings
  • Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2015. Empirical evaluation of gated recurrent neural networks on sequence modeling. In Proceedings of NIPS.
    Google ScholarLocate open access versionFindings
  • Yanzhuo Ding, Yang Liu, Huanbo Luan, and Maosong Sun. 201Visualizing and understanding neural machine translation. In Proceedings of ACL.
    Google ScholarLocate open access versionFindings
  • Chris Dyer, Victor Chahuneau, and Noah A Smith. 2013. A simple, fast, and effective reparameterization of ibm model 2. In Proceedings of ACL.
    Google ScholarLocate open access versionFindings
  • Sepp Hochreiter and Jurgen Schmidhuber. 1997. Long short-term memory. Neural Computation.
    Google ScholarFindings
  • Marcin Junczys-Dowmunt, Tomasz Dwojak, and Hieu Hoang. 2016. Is neural machine translation ready for deployment? a case study on 30 translation directions. arXiv:1610.01108v2.
    Findings
  • Diederik Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of ICLR.
    Google ScholarLocate open access versionFindings
  • Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, et al. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of ACL.
    Google ScholarLocate open access versionFindings
  • Minh-Thang Luong, Ilya Sutskever, Quoc V Le, Oriol Vinyals, and Wojciech Zaremba. 2015. Addressing the rare word problem in neural machine translation. In Proceedings of ACL.
    Google ScholarLocate open access versionFindings
  • Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of ACL.
    Google ScholarLocate open access versionFindings
  • Shiqi Shen, Yong Cheng, Zhongjun He, Wei He, Hua Wu, Maosong Sun, and Yang Liu. 2016. Minimum risk training for neural machine translation. In Proceedings of ACL.
    Google ScholarLocate open access versionFindings
  • Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Proceedings of NIPS.
    Google ScholarLocate open access versionFindings
  • Matthew D. Zeiler. 2012. Adadelta: an adaptive learning rate method. arXiv:1212.5701.
    Findings
Your rating :
0

 

Tags
Comments