XNMT: The eXtensible Neural Machine Translation Toolkit
AMTA, pp. 185-192, 2018.
EI
Keywords:
Low Resource Languages for Emergent Incidentsextensible neural machine translation toolkitopen sourcespeech recognitionXNMTMore(4+)
Weibo:
Abstract:
This paper describes XNMT, the eXtensible Neural Machine Translation toolkit. XNMT distin- guishes itself from other open-source NMT toolkits by its focus on modular code design, with the purpose of enabling fast iteration in research and replicable, reliable results. In this paper we describe the design of XNMT and its experiment configu...More
Code:
Data:
Introduction
- Due to the effectiveness and relative ease of implementation, there is a proliferation of toolkits for neural machine translation (Kalchbrenner and Blunsom, 2013; Sutskever et al, 2014; Bahdanau et al, 2015), as many as 51 according to the tally by nmt-list.1 The common requirements for such toolkits are speed, memory efficiency, and translation accuracy, which are essential for the use of such systems in practical translation settings.
- Instead of only optimizing time for training or inference, XNMT aims to reduce the time it takes for a researcher to turn their idea into a practical experimental setting, test with a large number of parameters, and produce valid and trustable research results
- This necessitates a certain level of training efficiency and accuracy, but XNMT takes into account a number of considerations, such as those below:
Highlights
- Due to the effectiveness and relative ease of implementation, there is now a proliferation of toolkits for neural machine translation (Kalchbrenner and Blunsom, 2013; Sutskever et al, 2014; Bahdanau et al, 2015), as many as 51 according to the tally by nmt-list.1 The common requirements for such toolkits are speed, memory efficiency, and translation accuracy, which are essential for the use of such systems in practical translation settings
- Many open source toolkits do an excellent job at this to the point where they can be used in production systems (e.g. OpenNMT2 is used by Systran (Crego et al, 2016))
- This paper describes XNMT, the eXtensible Neural Machine Translation toolkit, a toolkit that optimizes not for efficiency, but instead for ease of use in practical research settings
- In the remainder of the paper, we provide some concrete examples of the design principles behind XNMT, and a few examples of how it can be used to implement standard models
- As Figure 1 demonstrates, XNMT supports the basic functionality for experiments described in §2.1
- This paper has introduced XNMT, an NMT toolkit with extensibility in mind, and has described the various design decisions that went into making this goal possible
Methods
- Experimental Setup and Support
As Figure 1 demonstrates, XNMT supports the basic functionality for experiments described in §2.1. - Multiple experiments and sharing of parameters: Multiple experiments can be specified in a single YAML file by defining multiple top-level elements of the YAML file.
- These multiple experiments can share settings through YAML anchors, where one experiment can inherit the settings from another, only overwriting the relevant settings that needs to be changed
Conclusion
- This paper has introduced XNMT, an NMT toolkit with extensibility in mind, and has described the various design decisions that went into making this goal possible.
Summary
Introduction:
Due to the effectiveness and relative ease of implementation, there is a proliferation of toolkits for neural machine translation (Kalchbrenner and Blunsom, 2013; Sutskever et al, 2014; Bahdanau et al, 2015), as many as 51 according to the tally by nmt-list.1 The common requirements for such toolkits are speed, memory efficiency, and translation accuracy, which are essential for the use of such systems in practical translation settings.- Instead of only optimizing time for training or inference, XNMT aims to reduce the time it takes for a researcher to turn their idea into a practical experimental setting, test with a large number of parameters, and produce valid and trustable research results
- This necessitates a certain level of training efficiency and accuracy, but XNMT takes into account a number of considerations, such as those below:
Methods:
Experimental Setup and Support
As Figure 1 demonstrates, XNMT supports the basic functionality for experiments described in §2.1.- Multiple experiments and sharing of parameters: Multiple experiments can be specified in a single YAML file by defining multiple top-level elements of the YAML file.
- These multiple experiments can share settings through YAML anchors, where one experiment can inherit the settings from another, only overwriting the relevant settings that needs to be changed
Conclusion:
This paper has introduced XNMT, an NMT toolkit with extensibility in mind, and has described the various design decisions that went into making this goal possible.
Tables
- Table1: Speech recognition results (WER in %) compared to a similar pyramidal LSTM model (<a class="ref-link" id="cZhang_et+al_2017_a" href="#rZhang_et+al_2017_a">Zhang et al, 2017</a>) and a highly engineered hybrid HMM system (<a class="ref-link" id="cRousseau_et+al_2014_a" href="#rRousseau_et+al_2014_a">Rousseau et al, 2014</a>)
Funding
- Part of the development of XNMT was performed at the Jelinek Summer Workshop in Speech and Language Technology (JSALT) “Speaking Rosetta Stone” project (Scharenborg et al, 2018), and we are grateful to the JSALT organizers for the financial/logistical support, and also participants of the workshop for their feedback on XNMT as a tool. Parts of this work were sponsored by Defense Advanced Research Projects Agency Information Innovation Office (I2O)
Reference
- Bahdanau, D., Cho, K., and Bengio, Y. (2015). Neural Machine Translation by Jointly Learning to Align and Translate. Proc. of ICLR.
- Chan, W., Jaitly, N., Le, Q., and Vinyals, O. (2016). Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. In Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on, pages 4960–4964. IEEE.
- Crego, J., Kim, J., Klein, G., Rebollo, A., Yang, K., Senellart, J., Akhanov, E., Brunelle, P., Coquard, A., Deng, Y., et al. (2016). Systran’s pure neural machine translation systems. arXiv preprint arXiv:1610.05540.
- Dai, A. M. and Le, Q. V. (2015). Semi-supervised sequence learning. In Advances in Neural Information Processing Systems, pages 3079–3087.
- Gal, Y. and Ghahramani, Z. (2016). A theoretically grounded application of dropout in recurrent neural networks. In Advances in neural information processing systems, pages 1019–1027.
- Harwath, D., Torralba, A., and Glass, J. (2016). Unsupervised learning of spoken language with visual context. In Advances in Neural Information Processing Systems, pages 1858–1866.
- Huang, P.-S., He, X., Gao, J., Deng, L., Acero, A., and Heck, L. (2013). Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management, pages 2333–2338. ACM.
- Kalchbrenner, N. and Blunsom, P. (2013). Recurrent Continuous Translation Models. In Empirical Methods in Natural Language Processing (EMNLP), pages 1700–1709, Seattle, Washington, USA.
- Kingma, D. P. and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
- Luong, M.-T., Pham, H., and Manning, C. D. (2015). Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025.
- Neubig, G., Dyer, C., Goldberg, Y., Matthews, A., Ammar, W., Anastasopoulos, A., Ballesteros, M., Chiang, D., Clothiaux, D., Cohn, T., Duh, K., Faruqui, M., Gan, C., Garrette, D., Ji, Y., Kong, L., Kuncoro, A., Kumar, G., Malaviya, C., Michel, P., Oda, Y., Richardson, M., Saphra, N., Swayamdipta, S., and Yin, P. (2017). DyNet: The Dynamic Neural Network Toolkit. arXiv preprint arXiv:1701.03980.
- Paul, D. B. and Baker, J. M. (1992). The design for the wall street journal-based csr corpus. In Proceedings of the workshop on Speech and Natural Language, pages 357–362. Association for Computational Linguistics.
- Press, O. and Wolf, L. (2016). Using the output embedding to improve language models. arXiv preprint arXiv:1608.05859.
- Ranzato, M., Chopra, S., Auli, M., and Zaremba, W. (2015). Sequence level training with recurrent neural networks. arXiv preprint arXiv:1511.06732.
- Rousseau, A., Deleglise, P., and Esteve, Y. (2014). Enhancing the ted-lium corpus with selected data for language modeling and more ted talks. In LREC, pages 3935–3939.
- Scharenborg, O., Besacier, L., Black, A., Hasegawa-Johnson, M., Metze, F., Neubig, G., Stuker, S., Godard, P., Muller, M., Ondel, L., Palaskar, S., Arthur, P., Ciannella, F., Du, M., Larsen, E., Merkx, D., Riad, R., Wang, L., and Dupoux, E. (2018). Linguistic unit discovery from multi-modal inputs in unwritten languages: Summary of the ”speaking rosetta” JSALT 2017 workshop. In 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2018), Calgary, Canada.
- Sennrich, R., Haddow, B., and Birch, A. (2016). Neural machine translation of rare words with subword units. In 54th Annual Meeting of the Association for Computational Linguistics.
- Shen, S., Cheng, Y., He, Z., He, W., Wu, H., Sun, M., and Liu, Y. (2016). Minimum risk training for neural machine translation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1683–1692, Berlin, Germany. Association for Computational Linguistics.
- Sutskever, I., Vinyals, O., and Le, Q. V. (2014). Sequence to Sequence Learning with Neural Networks. In Advances in Neural Information Processing Systems (NIPS), pages 3104–3112, Montreal, Canada.
- Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2818–2826.
- Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems, pages 6000–6010.
- Vinyals, O., Kaiser, L., Koo, T., Petrov, S., Sutskever, I., and Hinton, G. E. (2015). Grammar as a foreign language. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, pages 2773– 2781.
- Zhang, X., Zhao, J., and LeCun, Y. (2015). Character-level convolutional networks for text classification. In Advances in neural information processing systems, pages 649–657.
- Zhang, Y., Chan, W., and Jaitly, N. (2017). Very deep convolutional networks for end-to-end speech recognition. In Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on, pages 4845–4849. IEEE.
Tags
Comments