XNMT: The eXtensible Neural Machine Translation Toolkit

AMTA, pp. 185-192, 2018.

Cited by: 25|Bibtex|Views154|Links
EI
Keywords:
Low Resource Languages for Emergent Incidentsextensible neural machine translation toolkitopen sourcespeech recognitionXNMTMore(4+)
Weibo:
This paper has introduced XNMT, an NMT toolkit with extensibility in mind, and has described the various design decisions that went into making this goal possible

Abstract:

This paper describes XNMT, the eXtensible Neural Machine Translation toolkit. XNMT distin- guishes itself from other open-source NMT toolkits by its focus on modular code design, with the purpose of enabling fast iteration in research and replicable, reliable results. In this paper we describe the design of XNMT and its experiment configu...More

Code:

Data:

Introduction
  • Due to the effectiveness and relative ease of implementation, there is a proliferation of toolkits for neural machine translation (Kalchbrenner and Blunsom, 2013; Sutskever et al, 2014; Bahdanau et al, 2015), as many as 51 according to the tally by nmt-list.1 The common requirements for such toolkits are speed, memory efficiency, and translation accuracy, which are essential for the use of such systems in practical translation settings.
  • Instead of only optimizing time for training or inference, XNMT aims to reduce the time it takes for a researcher to turn their idea into a practical experimental setting, test with a large number of parameters, and produce valid and trustable research results
  • This necessitates a certain level of training efficiency and accuracy, but XNMT takes into account a number of considerations, such as those below:
Highlights
  • Due to the effectiveness and relative ease of implementation, there is now a proliferation of toolkits for neural machine translation (Kalchbrenner and Blunsom, 2013; Sutskever et al, 2014; Bahdanau et al, 2015), as many as 51 according to the tally by nmt-list.1 The common requirements for such toolkits are speed, memory efficiency, and translation accuracy, which are essential for the use of such systems in practical translation settings
  • Many open source toolkits do an excellent job at this to the point where they can be used in production systems (e.g. OpenNMT2 is used by Systran (Crego et al, 2016))
  • This paper describes XNMT, the eXtensible Neural Machine Translation toolkit, a toolkit that optimizes not for efficiency, but instead for ease of use in practical research settings
  • In the remainder of the paper, we provide some concrete examples of the design principles behind XNMT, and a few examples of how it can be used to implement standard models
  • As Figure 1 demonstrates, XNMT supports the basic functionality for experiments described in §2.1
  • This paper has introduced XNMT, an NMT toolkit with extensibility in mind, and has described the various design decisions that went into making this goal possible
Methods
  • Experimental Setup and Support

    As Figure 1 demonstrates, XNMT supports the basic functionality for experiments described in §2.1.
  • Multiple experiments and sharing of parameters: Multiple experiments can be specified in a single YAML file by defining multiple top-level elements of the YAML file.
  • These multiple experiments can share settings through YAML anchors, where one experiment can inherit the settings from another, only overwriting the relevant settings that needs to be changed
Conclusion
  • This paper has introduced XNMT, an NMT toolkit with extensibility in mind, and has described the various design decisions that went into making this goal possible.
Summary
  • Introduction:

    Due to the effectiveness and relative ease of implementation, there is a proliferation of toolkits for neural machine translation (Kalchbrenner and Blunsom, 2013; Sutskever et al, 2014; Bahdanau et al, 2015), as many as 51 according to the tally by nmt-list.1 The common requirements for such toolkits are speed, memory efficiency, and translation accuracy, which are essential for the use of such systems in practical translation settings.
  • Instead of only optimizing time for training or inference, XNMT aims to reduce the time it takes for a researcher to turn their idea into a practical experimental setting, test with a large number of parameters, and produce valid and trustable research results
  • This necessitates a certain level of training efficiency and accuracy, but XNMT takes into account a number of considerations, such as those below:
  • Methods:

    Experimental Setup and Support

    As Figure 1 demonstrates, XNMT supports the basic functionality for experiments described in §2.1.
  • Multiple experiments and sharing of parameters: Multiple experiments can be specified in a single YAML file by defining multiple top-level elements of the YAML file.
  • These multiple experiments can share settings through YAML anchors, where one experiment can inherit the settings from another, only overwriting the relevant settings that needs to be changed
  • Conclusion:

    This paper has introduced XNMT, an NMT toolkit with extensibility in mind, and has described the various design decisions that went into making this goal possible.
Tables
  • Table1: Speech recognition results (WER in %) compared to a similar pyramidal LSTM model (<a class="ref-link" id="cZhang_et+al_2017_a" href="#rZhang_et+al_2017_a">Zhang et al, 2017</a>) and a highly engineered hybrid HMM system (<a class="ref-link" id="cRousseau_et+al_2014_a" href="#rRousseau_et+al_2014_a">Rousseau et al, 2014</a>)
Download tables as Excel
Funding
  • Part of the development of XNMT was performed at the Jelinek Summer Workshop in Speech and Language Technology (JSALT) “Speaking Rosetta Stone” project (Scharenborg et al, 2018), and we are grateful to the JSALT organizers for the financial/logistical support, and also participants of the workshop for their feedback on XNMT as a tool. Parts of this work were sponsored by Defense Advanced Research Projects Agency Information Innovation Office (I2O)
Reference
  • Bahdanau, D., Cho, K., and Bengio, Y. (2015). Neural Machine Translation by Jointly Learning to Align and Translate. Proc. of ICLR.
    Google ScholarFindings
  • Chan, W., Jaitly, N., Le, Q., and Vinyals, O. (2016). Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. In Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on, pages 4960–4964. IEEE.
    Google ScholarLocate open access versionFindings
  • Crego, J., Kim, J., Klein, G., Rebollo, A., Yang, K., Senellart, J., Akhanov, E., Brunelle, P., Coquard, A., Deng, Y., et al. (2016). Systran’s pure neural machine translation systems. arXiv preprint arXiv:1610.05540.
    Findings
  • Dai, A. M. and Le, Q. V. (2015). Semi-supervised sequence learning. In Advances in Neural Information Processing Systems, pages 3079–3087.
    Google ScholarLocate open access versionFindings
  • Gal, Y. and Ghahramani, Z. (2016). A theoretically grounded application of dropout in recurrent neural networks. In Advances in neural information processing systems, pages 1019–1027.
    Google ScholarLocate open access versionFindings
  • Harwath, D., Torralba, A., and Glass, J. (2016). Unsupervised learning of spoken language with visual context. In Advances in Neural Information Processing Systems, pages 1858–1866.
    Google ScholarLocate open access versionFindings
  • Huang, P.-S., He, X., Gao, J., Deng, L., Acero, A., and Heck, L. (2013). Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management, pages 2333–2338. ACM.
    Google ScholarLocate open access versionFindings
  • Kalchbrenner, N. and Blunsom, P. (2013). Recurrent Continuous Translation Models. In Empirical Methods in Natural Language Processing (EMNLP), pages 1700–1709, Seattle, Washington, USA.
    Google ScholarLocate open access versionFindings
  • Kingma, D. P. and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
    Findings
  • Luong, M.-T., Pham, H., and Manning, C. D. (2015). Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025.
    Findings
  • Neubig, G., Dyer, C., Goldberg, Y., Matthews, A., Ammar, W., Anastasopoulos, A., Ballesteros, M., Chiang, D., Clothiaux, D., Cohn, T., Duh, K., Faruqui, M., Gan, C., Garrette, D., Ji, Y., Kong, L., Kuncoro, A., Kumar, G., Malaviya, C., Michel, P., Oda, Y., Richardson, M., Saphra, N., Swayamdipta, S., and Yin, P. (2017). DyNet: The Dynamic Neural Network Toolkit. arXiv preprint arXiv:1701.03980.
    Findings
  • Paul, D. B. and Baker, J. M. (1992). The design for the wall street journal-based csr corpus. In Proceedings of the workshop on Speech and Natural Language, pages 357–362. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Press, O. and Wolf, L. (2016). Using the output embedding to improve language models. arXiv preprint arXiv:1608.05859.
    Findings
  • Ranzato, M., Chopra, S., Auli, M., and Zaremba, W. (2015). Sequence level training with recurrent neural networks. arXiv preprint arXiv:1511.06732.
    Findings
  • Rousseau, A., Deleglise, P., and Esteve, Y. (2014). Enhancing the ted-lium corpus with selected data for language modeling and more ted talks. In LREC, pages 3935–3939.
    Google ScholarFindings
  • Scharenborg, O., Besacier, L., Black, A., Hasegawa-Johnson, M., Metze, F., Neubig, G., Stuker, S., Godard, P., Muller, M., Ondel, L., Palaskar, S., Arthur, P., Ciannella, F., Du, M., Larsen, E., Merkx, D., Riad, R., Wang, L., and Dupoux, E. (2018). Linguistic unit discovery from multi-modal inputs in unwritten languages: Summary of the ”speaking rosetta” JSALT 2017 workshop. In 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2018), Calgary, Canada.
    Google ScholarLocate open access versionFindings
  • Sennrich, R., Haddow, B., and Birch, A. (2016). Neural machine translation of rare words with subword units. In 54th Annual Meeting of the Association for Computational Linguistics.
    Google ScholarFindings
  • Shen, S., Cheng, Y., He, Z., He, W., Wu, H., Sun, M., and Liu, Y. (2016). Minimum risk training for neural machine translation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1683–1692, Berlin, Germany. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Sutskever, I., Vinyals, O., and Le, Q. V. (2014). Sequence to Sequence Learning with Neural Networks. In Advances in Neural Information Processing Systems (NIPS), pages 3104–3112, Montreal, Canada.
    Google ScholarLocate open access versionFindings
  • Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2818–2826.
    Google ScholarLocate open access versionFindings
  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems, pages 6000–6010.
    Google ScholarLocate open access versionFindings
  • Vinyals, O., Kaiser, L., Koo, T., Petrov, S., Sutskever, I., and Hinton, G. E. (2015). Grammar as a foreign language. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, pages 2773– 2781.
    Google ScholarLocate open access versionFindings
  • Zhang, X., Zhao, J., and LeCun, Y. (2015). Character-level convolutional networks for text classification. In Advances in neural information processing systems, pages 649–657.
    Google ScholarLocate open access versionFindings
  • Zhang, Y., Chan, W., and Jaitly, N. (2017). Very deep convolutional networks for end-to-end speech recognition. In Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on, pages 4845–4849. IEEE.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments