Multilingual Denoising Pre-training for Neural Machine Translation

Cited by: 0|Bibtex|Views120|Links
Keywords:
Common Crawlmonolingual corporaneural machine translationdocument levellow resourceMore(5+)
Weibo:
We demonstrate that multilingual de-noising pretraining is able to significantly improve both supervised and unsupervised machine translation at both the sentence level and document level

Abstract:

This paper demonstrates that multilingual denoising pre-training produces significant performance gains across a wide variety of machine translation (MT) tasks. We present mBART -- a sequence-to-sequence denoising auto-encoder pre-trained on large-scale monolingual corpora in many languages using the BART objective. mBART is the first m...More

Code:

Data:

0
Introduction
Highlights
  • Despite its wide adoption for other NLP tasks (Devlin et al, 2019; Liu et al, 2019; Yang et al, 2019; Lewis et al, 2019; Raffel et al, 2019), selfsupervised pretraining is not yet common practice in machine translation (MT)
  • Pre-trained The MT models initialized with pre-trained weights outperform randomly initialized models by large margins, for both sentence-level and document-level training
  • Doc-MT For cases (En-De, EnZh), the mBART25 Doc-MT models outperform themselves fine-tuned at sentence-level by a margin, which is completely opposite for models without pre-training
  • Randomly initialized Doc-MT fail to work, resulting in much worse results than the sentence-level models. Such large performance gaps indicate that pre-training is critical for document level performance
  • The second case of unsupervised machine translation assumes the target language appears in a bitext corpus with some other source language
  • We show that mBART can improve performance even with fine tuning for languages that did not appear in the pre-training corpora, suggesting that the pre-training has language universal aspects, especially within the parameters learned at the Transformer layers
  • We demonstrate that multilingual de-noising pretraining is able to significantly improve both supervised and unsupervised machine translation at both the sentence level and document level
Results
  • MBART25 uses all languages during pre-training, but other settings contain at least one unseen language.
  • Doc-MT For cases (En-De, EnZh), the mBART25 Doc-MT models outperform themselves fine-tuned at sentence-level by a margin, which is completely opposite for models without pre-training
  • For both datasets, randomly initialized Doc-MT fail to work, resulting in much worse results than the sentence-level models.
  • The authors apply the same procedure on randomly initialized models without pretraining, which always ends up with ≈ 0 BLEU
  • This indicates that multilingual pre-training is essential and produces universal representations across languages, so that once the model learns to translate one language to En, it learns to trans-
Conclusion
  • The authors demonstrate that multilingual de-noising pretraining is able to significantly improve both supervised and unsupervised machine translation at both the sentence level and document level.
  • The authors analyze when and how pre-training is most effective and can be combined with other approaches such as back-translation.
  • The authors' results show the transfer learning ability of the learned representations from multilingual pre-training.
  • The authors will scale-up the current pretraining to more languages, e.g., an mBART100 model.
  • The size of the model makes it expensive to deploy in production – future work will explore pre-training more efficient models
Summary
  • Introduction:

    Despite its wide adoption for other NLP tasks (Devlin et al, 2019; Liu et al, 2019; Yang et al, 2019; Lewis et al, 2019; Raffel et al, 2019), selfsupervised pretraining is not yet common practice in machine translation (MT).
  • Existing MT approaches only pre-train parts of the model, including the encoder (Lample and Conneau, 2019) and the decoder (Edunov et al, 2019), or use pretraining objectives that only reconstruct parts of text (Song et al, 2019), or only focus on English corpora (Lewis et al, 2019; Raffel et al, 2019).
  • Different from other pre-training approaches for MT (Lample and Conneau, 2019; Song et al, 2019), mBART pre-trains a complete autoregressive Seq2Seq model. mBART is trained once for all languages, providing a set of parameters that can be fine-tuned for any of the language pairs in both supervised and unsupervised settings, without any task-specific or language-specific modifications or initialization schemes
  • Objectives:

    We (1) assume access to a noising function g, defined below, that corrupts text, and (2) train the model to predict the original text X given g(X).
  • Results:

    MBART25 uses all languages during pre-training, but other settings contain at least one unseen language.
  • Doc-MT For cases (En-De, EnZh), the mBART25 Doc-MT models outperform themselves fine-tuned at sentence-level by a margin, which is completely opposite for models without pre-training
  • For both datasets, randomly initialized Doc-MT fail to work, resulting in much worse results than the sentence-level models.
  • The authors apply the same procedure on randomly initialized models without pretraining, which always ends up with ≈ 0 BLEU
  • This indicates that multilingual pre-training is essential and produces universal representations across languages, so that once the model learns to translate one language to En, it learns to trans-
  • Conclusion:

    The authors demonstrate that multilingual de-noising pretraining is able to significantly improve both supervised and unsupervised machine translation at both the sentence level and document level.
  • The authors analyze when and how pre-training is most effective and can be combined with other approaches such as back-translation.
  • The authors' results show the transfer learning ability of the learned representations from multilingual pre-training.
  • The authors will scale-up the current pretraining to more languages, e.g., an mBART100 model.
  • The size of the model makes it expensive to deploy in production – future work will explore pre-training more efficient models
Tables
  • Table1: Languages and Statistics of the CC25 Corpus. A list of 25 languages ranked with monolingual corpus size. Throughout this paper, we replace the language names with their ISO codes for simplicity. (*) Chinese and Japanese corpus are not segmented, so the tokens counts here are sentences counts
  • Table2: Low/Medium Resource Machine Translation Pre-training consistently improves over a randomly initialized baseline, with particularly large gains on low resource language pairs (e.g. Vi-En)
  • Table3: High Resource Machine Translation where all the datasets are from their latest WMT competitions. We only evaluate our models on En-X translation
  • Table4: Comparison with Other Pre-training Approaches on WMT16 Ro-En
  • Table5: Pretraining Languages on En-X translation. The size refers to the size of monolingual data for X. The size of En is shown as reference. All the pretrained models were controlled to see the same number of English instances during training
  • Table6: Comparison with Back-Translation on My-En translation using same mono-lingual data. We also estimate the computational costs for both pre-training and back-translation based on Nvidia V100 GPUs
  • Table7: Generalization to Unseen Languages Language transfer results, fine-tuning on language-pairs without pre-training on them. mBART25 uses all languages during pre-training, while other settings contain at least one unseen language pair. For each model, we also show the gap to mBART25 results
  • Table8: Statistics for the Document-level Corpus of WMT19 En-De and TED15 Zh-En. # of instances is the # of training examples in document model
  • Table9: Document-Level Machine Translation on En-De and Zh-En. (×) The randomly initialized Doc-MT model cannot produce translations aligned to the original sentences, so only document evaluation is possible
  • Table10: Unsupervised MT via Back-Translation. En-De, En-Ro are initialized by mBART02, while En-Ne, En-Si are initialized by mBART25. Our models are trained on monolingual data used in pre-training
  • Table11: Unsupervised MT via Language Transfer on X-En translations. The model fine-tuned on one language pair is directly tested on another. We use gray color to show the direct fine-tuning results, and lightgray color to show language transfer within similar language groups. We bold the highest transferring score for each pair
  • Table12: Back-Translation v.s. Language Transfer for Unsupervised MT. We present the best transferring scores together with the pairs transferred from
Download tables as Excel
Related work
  • Pre-training for Text Generation This work inherits from the recent success brought by selfsupervised pre-training for NLP applications (Peters et al, 2018; Radford et al, 2018; Devlin et al, 2019; Yang et al, 2019; Liu et al, 2019), especially for text generation tasks (Radford et al, 2019; Song et al, 2019; Dong et al, 2019; Raffel et al, 2019; Lewis et al, 2019) where different self-supervised objectives are designed for training big neural models on enormous unlabeled text corpora The pre-trained models are usually used as the initialization for fine-tuning variant downstream tasks such as controllable language modeling (Shirish Keskar et al, 2019), machine translation (Song et al, 2019), summarization (Liu and Lapata, 2019) and dialogue generation (Zhang et al, 2019). In contrast to most prior work, we focus on a deep exploration of applying denoising pre-training for various translation applications.

    Multilinguality in NLP tasks This work is also related to the continual trend of multilingual language learning, including aligning multilingual word embeddings (Mikolov et al, 2013; Chen and Cardie, 2018; Lample et al, 2018b) into universal space, and learning cross-lingual models (Wada and Iwata, 2018; Lample and Conneau, 2019; Conneau et al, 2019) to exploit shared representations across languages.

    For machine translation, the most relevant field is multilingual translation (Firat et al, 2016; Viégas et al, 2016; Aharoni et al, 2019; Arivazhagan et al, 2019) where the ultimate goal is to jointly train one translation model that translates multiple language directions at the same time, and shares representations to improve the translation performance on low-resource languages (Gu et al, 2018). In this paper, we mainly focus on multilingualism in the pre-training stage and fine-tune the learned model in the standard bi-lingual scenario. Compared to multilingual translation, we do not require parallel data across multiple languages but the targeted direction, which potentially improves the scalability to low-resource languages and specific domains. Moreover, multilingual pre-training is unlikely to suffer the interference problems between dissimilar languages, which is typical for regular multilingual translation models.
Reference
  • Roee Aharoni, Melvin Johnson, and Orhan Firat. 2019. Massively multilingual neural machine translation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3874–3884, Minneapolis, Minnesota. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Naveen Arivazhagan, Ankur Bapna, Orhan Firat, Dmitry Lepikhin, Melvin Johnson, Maxim Krikun, Mia Xu Chen, Yuan Cao, George Foster, Colin Cherry, Wolfgang Macherey, Zhifeng Chen, and Yonghui Wu. 2019. Massively multilingual neural machine translation in the wild: Findings and challenges. CoRR, abs/1907.05019.
    Findings
  • Mikel Artetxe, Gorka Labaka, Eneko Agirre, and Kyunghyun Cho. 2017. Unsupervised neural machine translation. arXiv preprint arXiv:1710.11041.
    Findings
  • Mikel Artetxe, Sebastian Ruder, and Dani Yogatama. 2019. On the cross-lingual transferability of monolingual representations.
    Google ScholarFindings
  • Mauro Cettolo, Christian Girardi, and Marcello Federico. 2012. Wit3: Web inventory of transcribed and translated talks. In Conference of European Association for Machine Translation, pages 261–268.
    Google ScholarLocate open access versionFindings
  • Mauro Cettolo, Niehues Jan, Stüker Sebastian, Luisa Bentivogli, Roldano Cattoni, and Marcello Federico. 2015. The iwslt 2015 evaluation campaign. In International Workshop on Spoken Language Translation.
    Google ScholarLocate open access versionFindings
  • Peng-Jen Chen, Jiajun Shen, Matt Le, Vishrav Chaudhary, Ahmed El-Kishky, Guillaume Wenzek, Myle Ott, and Marc’Aurelio Ranzato. 2019. Facebook ai’s wat19 myanmar-english translation task submission. arXiv preprint arXiv:1910.06848.
    Findings
  • Xilun Chen and Claire Cardie. 201Unsupervised multilingual word embeddings. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 261–270, Brussels, Belgium. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Yun Chen, Yang Liu, Yong Cheng, and Victor OK Li. 2017. A teacher-student framework for zero-resource neural machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1925–1935.
    Google ScholarLocate open access versionFindings
  • Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116.
    Findings
  • Alexis Conneau, Ruty Rinott, Guillaume Lample, Adina Williams, Samuel R. Bowman, Holger Schwenk, and Veselin Stoyanov. 2018. Xnli: Evaluating cross-lingual sentence representations. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In North American Association for Computational Linguistics (NAACL).
    Google ScholarLocate open access versionFindings
  • Chenchen Ding, Hnin Thu Zar Aye, Win Pa Pa, Khin Thandar Nwet, Khin Mar Soe, Masao Utiyama, and Eiichiro Sumita. 2019. Towards Burmese (Myanmar) morphological analysis: Syllable-based tokenization and part-of-speech tagging. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 19(1):5.
    Google ScholarLocate open access versionFindings
  • Chenchen Ding, Masao Utiyama, and Eiichiro Sumita. 2018. NOVA: A feasible and flexible annotation system for joint tokenization and part-of-speech tagging. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 18(2):17.
    Google ScholarLocate open access versionFindings
  • Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, and Hsiao-Wuen Hon. 2019. Unified language model pre-training for natural language understanding and generation. arXiv preprint arXiv:1905.03197.
    Findings
  • Sergey Edunov, Alexei Baevski, and Michael Auli. 2019. Pre-trained language model representations for language generation. arXiv preprint arXiv:1903.09722.
    Findings
  • Orhan Firat, Kyunghyun Cho, and Yoshua Bengio. 2016. Multi-way, multilingual neural machine translation with a shared attention mechanism. In NAACL.
    Google ScholarFindings
  • Jiatao Gu, Hany Hassan, Jacob Devlin, and Victor O.K. Li. 20Universal neural machine translation for extremely low resource languages. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 344–354, New Orleans, Louisiana. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Jiatao Gu, Yong Wang, Kyunghyun Cho, and Victor OK Li. 20Improved zero-shot neural machine translation via ignoring spurious correlations. arXiv preprint arXiv:1906.01181.
    Findings
  • Francisco Guzmán, Peng-Jen Chen, Myle Ott, Juan Pino, Guillaume Lample, Philipp Koehn, Vishrav Chaudhary, and Marc’Aurelio Ranzato. 2019. The FLORES evaluation datasets for low-resource machine translation: Nepali– English and Sinhala–English. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 6097–6110, Hong Kong, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Sébastien Jean, Stanislas Lauly, Orhan Firat, and Kyunghyun Cho. 2017. Does neural machine translation benefit from larger context? CoRR, abs/1704.05135.
    Findings
  • Melvin Johnson, Mike Schuster, Quoc V Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Viégas, Martin Wattenberg, Greg Corrado, et al. 2017. Google’s multilingual neural machine translation system: Enabling zero-shot translation. Transactions of the Association for Computational Linguistics, 5:339–351.
    Google ScholarLocate open access versionFindings
  • Taku Kudo and John Richardson. 2018. SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 66–71, Brussels, Belgium. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Anoop Kunchukuttan, Pratik Mehta, and Pushpak Bhattacharyya. 2017. The IIT bombay englishhindi parallel corpus. CoRR, abs/1710.02855.
    Findings
  • Guillaume Lample and Alexis Conneau. 2019. Cross-lingual language model pretraining. arXiv preprint arXiv:1901.07291.
    Findings
  • Guillaume Lample, Alexis Conneau, Ludovic Denoyer, and Marc’Aurelio Ranzato. 2018a. Unsupervised machine translation using monolingual corpora only. In International Conference on Learning Representations.
    Google ScholarLocate open access versionFindings
  • Guillaume Lample, Alexis Conneau, Marc’Aurelio Ranzato, Ludovic Denoyer, and Hervé Jégou. 2018b. Word translation without parallel data. In International Conference on Learning Representations.
    Google ScholarFindings
  • Guillaume Lample, Myle Ott, Alexis Conneau, Ludovic Denoyer, and Marc’Aurelio Ranzato. 2018c. Phrase-based & neural unsupervised machine translation. arXiv preprint arXiv:1804.07755.
    Findings
  • Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2019. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461.
    Findings
  • Liangyou Li, Xin Jiang, and Qun Liu. 2019. Pretrained language models for document-level neural machine translation. arXiv preprint arXiv:1911.03110.
    Findings
  • Yang Liu and Mirella Lapata. 2019. Text summarization with pretrained encoders. arXiv preprint arXiv:1908.08345.
    Findings
  • Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
    Findings
  • Lesly Miculicich, Dhananjay Ram, Nikolaos Pappas, and James Henderson. 2018. Documentlevel neural machine translation with hierarchical attention networks. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2947–2954, Brussels, Belgium. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Tomas Mikolov, Quoc V. Le, and Ilya Sutskever. 2013. Exploiting similarities among languages for machine translation. CoRR, abs/1309.4168.
    Findings
  • Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, and Michael Auli. 2019. FAIRSEQ: A fast, extensible toolkit for sequence modeling. In North American Association for Computational Linguistics (NAACL): System Demonstrations.
    Google ScholarLocate open access versionFindings
  • Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics, pages 311–318. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In North American Association for Computational Linguistics (NAACL).
    Google ScholarLocate open access versionFindings
  • Matt Post. 2018. A call for clarity in reporting BLEU scores. In Proceedings of the Third Conference on Machine Translation: Research Papers, pages 186–191, Belgium, Brussels. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Nima Pourdamghani, Nada Aldarrab, Marjan Ghazvininejad, Kevin Knight, and Jonathan May. 2019. Translating translationese: A twostep approach to unsupervised machine translation. In ACL.
    Google ScholarFindings
  • Alec Radford, Karthik Narasimhan, Time Salimans, and Ilya Sutskever. 2018. Improving language understanding with unsupervised learning. Technical report, OpenAI.
    Google ScholarFindings
  • Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. Technical report, OpenAI.
    Google ScholarFindings
  • Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2019. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683.
    Findings
  • Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016a. Edinburgh neural machine translation systems for wmt 16. In Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers, pages 371–376.
    Google ScholarLocate open access versionFindings
  • Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016b. Improving neural machine translation models with monolingual data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 86–96, Berlin, Germany. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Nitish Shirish Keskar, Bryan McCann, Lav R Varshney, Caiming Xiong, and Richard Socher. 2019. Ctrl: A conditional transformer language model for controllable generation. arXiv preprint arXiv:1909.05858.
    Findings
  • Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, and Tie-Yan Liu. 2019. MASS: Masked sequence to sequence pre-training for language generation. In International Conference on Machine Learning (ICML).
    Google ScholarLocate open access versionFindings
  • Jörg Tiedemann and Yves Scherrer. 2017. Neural machine translation with extended context. In Proceedings of the Third Workshop on Discourse in Machine Translation, pages 82–92, Copenhagen, Denmark. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Zhaopeng Tu, Yang Liu, Shuming Shi, and Tong Zhang. 2018. Learning to remember translation history with a continuous cache. Transactions of the Association for Computational Linguistics, 6:407–420.
    Google ScholarLocate open access versionFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems.
    Google ScholarLocate open access versionFindings
  • Fernanda Viégas, Greg Corrado, Jeffrey Dean, Macduff Hughes, Martin Wattenberg, Maxim Krikun, Melvin Johnson, Mike Schuster, Nikhil Thorat, Quoc V Le, et al. 2016. Google’s multilingual neural machine translation system: Enabling zero-shot translation.
    Google ScholarFindings
  • Takashi Wada and Tomoharu Iwata. 2018. Unsupervised cross-lingual word embedding by multilingual neural language models. CoRR, abs/1809.02306.
    Findings
  • Longyue Wang, Zhaopeng Tu, Andy Way, and Qun Liu. 2017. Exploiting cross-sentence context for neural machine translation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2826–2831, Copenhagen, Denmark. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Guillaume Wenzek, Marie-Anne Lachaux, Alexis Conneau, Vishrav Chaudhary, Francisco Guzman, Armand Joulin, and Edouard Grave. 2019. Ccnet: Extracting high quality monolingual datasets from web crawl data. arXiv preprint arXiv:1911.00359.
    Findings
  • Jiawei Wu, Xin Wang, and William Yang Wang. 2019a. Extract and edit: An alternative to backtranslation for unsupervised neural machine translation. arXiv preprint arXiv:1904.02331.
    Findings
  • Lijun Wu, Jinhua Zhu, Di He, Fei Gao, Xu Tan, Tao Qin, and Tie-Yan Liu. 2019b. Machine translation with weakly paired bilingual documents.
    Google ScholarFindings
  • Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, and Quoc V Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. arXiv preprint arXiv:1906.08237.
    Findings
  • Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, and Bill Dolan. 2019. Dialogpt: Large-scale generative pre-training for conversational response generation.
    Google ScholarFindings
  • For all our tasks, we use BLEU scores (Papineni et al., 2002) as the automatic metric to evaluate the translation performance. Normally, we compute the BLEU scores over tokenized text for both system outputs and the references, and we apply language-wise tokenization after over the translation. Note that, since we directly work on raw texts, we automatically get de-tokenized output after recovering sentence-piece subwords. Following the literature, the instructions of language-wise tokenization are as follows:
    Google ScholarFindings
  • My: We use the official segmentation tool provided by Ding et al. (2019) for Burmese.
    Google ScholarLocate open access versionFindings
  • Ro: Following Sennrich et al. (2016a), we apply Moses tokenization and special normalization for Romanian texts 9.
    Google ScholarLocate open access versionFindings
  • Zh: We use the official sacreBleu (Post, 2018)10 Chinese tokenizer (–tok zh).
    Google ScholarFindings
  • 5https://anoopkunchukuttan.github.io/indic_nlp_library/ 6http://www.phontron.com/kytea/ 7http://konlpy.org/en/v0.3.0/install/ 8http://alt.qcri.org/tools/arabic-normalizer/ 9https://github.com/rsennrich/wmt16-script 10https://github.com/mjpost/sacreBLEU
    Findings
Your rating :
0

 

Tags
Comments