Machine Translation Pre-training for Data-to-Text Generation -- A Case Study in Czech

Kale Mihir
Kale Mihir
Roy Scott
Roy Scott
Cited by: 0|Bibtex|Views77|Links
Keywords:
neural machine translationstructured datatext corporaunsupervised transferautomatic metricMore(14+)
Weibo:
In this work we investigated neural machine translation based transfer learning for data-to-text generation in non-English languages

Abstract:

While there is a large body of research studying deep learning methods for text generation from structured data, almost all of it focuses purely on English. In this paper, we study the effectiveness of machine translation based pre-training for data-to-text generation in non-English languages. Since the structured data is generally expr...More

Code:

Data:

0
Introduction
  • Data-to-Text refers to the process of generating accurate and fluent natural language text from structured data such as tables, lists, graphs etc.(Gatt and Krahmer, 2018) It has several applications, including generating weather and sports summaries, response generation in task-oriented dialogue systems etc.
  • Data-to-text can broadly be classified into two categories with respect to the nature of the output text: lexicalized and delexicalized.
  • Models are trained to produce the full natural text.
  • The authors refer to these as lexicalized models.
  • Models are trained to produce output text with these placeholders.
  • The authors refer to these as delexicalized models.
  • For English, this is achieved by copying slot values from the structured data into the corresponding placeholders
Highlights
  • Data-to-Text refers to the process of generating accurate and fluent natural language text from structured data such as tables, lists, graphs etc.(Gatt and Krahmer, 2018) It has several applications, including generating weather and sports summaries, response generation in task-oriented dialogue systems etc
  • The system must take a meaning representation (MR) as input in this case represented in the form of a dialogue act and a list of key value pairs related to the restaurant - and generate fluent text that is firmly grounded in the MR
  • While unsupervised transfer learning performs better, pre-training via machine translation gives the best results by large margin. nmt brings down the Slot Error Rate (SER) to just 2.38, a 20 point gain over mass, while improving the BLEU score by 8 points
  • In this work we investigated neural machine translation based transfer learning for data-to-text generation in non-English languages
  • Using Czech as a target language, we showed that such an approach is effective and surpasses the performance of unsupervised transfer learning
  • The approach can be leveraged to improve performance of delexicalized models
  • It enables us to learn simple, fully lexicalized end-to-end models that perform on par with a sophisticated, linguistically informed pipelined system
Methods
  • Pre-training The authors use the Czech-English parallel corpus provided by the WMT 2019 shared task.
  • In order to facilitate a fair comparison, the authors use this corpus for the unsupervised pre-training baselines as well.
  • This effectively results in 114 million monolingual sentences, split between English and Czech.
  • Table 1 lists all the slots that appear in the dataset, along with examples
Results
  • The authors report results in Table 3
  • Recall that these are models are trained to generate fully lexicalized output.
  • While unsupervised transfer learning performs better, pre-training via machine translation gives the best results by large margin.
  • Binmt slightly outperforms nmt and leads to further gains across all metrics.
  • These results give credence to the hypothesis that machine translation can be a strong pre-training objective for data-to-text generation in non-English languages
Conclusion
  • Conclusion and Future Work

    In this work the authors investigated neural machine translation based transfer learning for data-to-text generation in non-English languages.
  • Using Czech as a target language, the authors showed that such an approach is effective and surpasses the performance of unsupervised transfer learning.
  • Studying pre-training on a wide variety of languages, especially those with different scripts, is a direct line of future work.
  • Since this is mainly hindered by a lack of datasets, the authors hope to develop data-to-text corpora for other languages, including ones that are truly low-resource
Summary
  • Introduction:

    Data-to-Text refers to the process of generating accurate and fluent natural language text from structured data such as tables, lists, graphs etc.(Gatt and Krahmer, 2018) It has several applications, including generating weather and sports summaries, response generation in task-oriented dialogue systems etc.
  • Data-to-text can broadly be classified into two categories with respect to the nature of the output text: lexicalized and delexicalized.
  • Models are trained to produce the full natural text.
  • The authors refer to these as lexicalized models.
  • Models are trained to produce output text with these placeholders.
  • The authors refer to these as delexicalized models.
  • For English, this is achieved by copying slot values from the structured data into the corresponding placeholders
  • Methods:

    Pre-training The authors use the Czech-English parallel corpus provided by the WMT 2019 shared task.
  • In order to facilitate a fair comparison, the authors use this corpus for the unsupervised pre-training baselines as well.
  • This effectively results in 114 million monolingual sentences, split between English and Czech.
  • Table 1 lists all the slots that appear in the dataset, along with examples
  • Results:

    The authors report results in Table 3
  • Recall that these are models are trained to generate fully lexicalized output.
  • While unsupervised transfer learning performs better, pre-training via machine translation gives the best results by large margin.
  • Binmt slightly outperforms nmt and leads to further gains across all metrics.
  • These results give credence to the hypothesis that machine translation can be a strong pre-training objective for data-to-text generation in non-English languages
  • Conclusion:

    Conclusion and Future Work

    In this work the authors investigated neural machine translation based transfer learning for data-to-text generation in non-English languages.
  • Using Czech as a target language, the authors showed that such an approach is effective and surpasses the performance of unsupervised transfer learning.
  • Studying pre-training on a wide variety of languages, especially those with different scripts, is a direct line of future work.
  • Since this is mainly hindered by a lack of datasets, the authors hope to develop data-to-text corpora for other languages, including ones that are truly low-resource
Tables
  • Table1: Slots appearing in the NLG dataset part
  • Table2: Czech NLG dataset statistics. The unique MRs are counted after delexicalizing the slots
  • Table3: Results. Ò implies higher is better, while Ó arrow implies lower is better. : We compute BLEU and SER
  • Table4: Ratings of machine generated output when compared to human written gold text
  • Table5: Human evaluations for accuracy and fluency
  • Table6: Czech translation performance on the WMT 2019 development set
  • Table7: NLG fine-tuning with low-resource NMT. The first column indicates the number of tokens used for pre-training
  • Table8: Experiments with low-resource NLG
  • Table9: Out-of-Vocabulary test set. unique refers to the number of unique values the slot takes in the test set. total is the number of times the slot appears
  • Table10: Results on delexicalized NLG. : We compute sacrebleu on outputs provided to us by the authors (Dusek and Jurc ́ıcek, 2019)
Download tables as Excel
Related work
  • Earlier work on NLG was mainly studied rulebased pipelined methods, but recent works favor end-to-end neural approaches. Wen et al (2015) proposed the Semantically Controlled LSTM and were one of the first to show the success of neural networks for this problem, with applications to task oriented dialogue. Since then, some works have focused on alternative architectures - Liu et al (2018)

    generate text by conditioning language models on tables, while Puduppully et al (2019) propose to explictly model entities present in the structured data. The findings of the E2E challenge (Dusek et al, 2018) show that standard seq2seq models with attention also perform well.

    With the advent of ELMo, BERT (Devlin et al, 2018) and GPT-2 (Radford et al, 2019), the unsupervised pre-training + fine-tuning paradigm has shown to be remarkably effective, leading to improvements in NLP tasks like classification, question answering and spoken language understanding (Siddhant et al, 2019a). Results for generation tasks like summarization are also positive, albeit less dramatic. Song et al (2019) propose the MASS technique and obtain state-of-the-art results for summarization and unsupervised machine translation. Freitag and Roy (2018) show that denoising autoencoders can be leveraged for unsupervised language generation from structured data. Budzianowski and Vulic (2019) cast data-totext as text-to-text generation and show that finetuning GPT language models can lead to performance competitive with architectures developed specifically for data-to-text. Chen et al (2019) use language models to improve performance in the low resource scenario.
Reference
  • Martın Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. Tensorflow: A system for large-scale machine learning. In 12th tUSENIXu Symposium on Operating Systems Design and Implementation (tOSDIu 16), pages 265–283.
    Google ScholarLocate open access versionFindings
  • Roee Aharoni, Melvin Johnson, and Orhan Firat. 2019. Massively multilingual neural machine translation. arXiv preprint arXiv:1903.00089.
    Findings
  • Paweł Budzianowski and Ivan Vulic. 2019.
    Google ScholarFindings
  • Zhiyu Chen, Harini Eavani, Yinyin Liu, and William Yang Wang. 2019. Few-shot nlg with pre-trained language model. arXiv preprint arXiv:1904.09521.
    Findings
  • Zewen Chi, Li Dong, Furu Wei, Wenhui Wang, XianLing Mao, and Heyan Huang. 2019. Cross-lingual natural language generation via pre-training. arXiv preprint arXiv:1909.10481.
    Findings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
    Findings
  • George Doddington. 2002. Automatic evaluation of machine translation quality using n-gram cooccurrence statistics. In Proceedings of the second international conference on Human Language Technology Research, pages 138–145. Morgan Kaufmann Publishers Inc.
    Google ScholarLocate open access versionFindings
  • Ondrej Dusek and Filip Jurcıcek. 2019. Neural generation for czech: Data and baselines. arXiv preprint arXiv:1910.05298.
    Findings
  • Ondrej Dusek, Jekaterina Novikova, and Verena Rieser. 2018. Findings of the e2e nlg challenge. arXiv preprint arXiv:1810.01170.
    Findings
  • Markus Freitag and Scott Roy. 2018. Unsupervised natural language generation with denoising autoencoders. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3922–3929.
    Google ScholarLocate open access versionFindings
  • Albert Gatt and Emiel Krahmer. 2018. Survey of the state of the art in natural language generation: Core tasks, applications and evaluation. Journal of Artificial Intelligence Research, 61:65–170.
    Google ScholarLocate open access versionFindings
  • Hiroaki Hayashi, Yusuke Oda, Alexandra Birch, Ioannis Konstas, Andrew Finch, Minh-Thang Luong, Graham Neubig, and Katsuhito Sudoh. 2019. Findings of the third workshop on neural generation and translation. arXiv preprint arXiv:1910.13299.
    Findings
  • Taku Kudo and John Richardson. 2018. Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. arXiv preprint arXiv:1808.06226.
    Findings
  • Alon Lavie and Abhaya Agarwal. 2007. Meteor: An automatic metric for mt evaluation with high levels of correlation with human judgments. In Proceedings of the Second Workshop on Statistical Machine Translation, pages 228–231. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81.
    Google ScholarFindings
  • Tianyu Liu, Kexiang Wang, Lei Sha, Baobao Chang, and Zhifang Sui. 2018. Table-to-text generation by structure-aware seq2seq learning. In Thirty-Second AAAI Conference on Artificial Intelligence.
    Google ScholarLocate open access versionFindings
  • Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics, pages 311–318. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Telmo Pires, Eva Schlinger, and Dan Garrette. 2019. How multilingual is multilingual bert? arXiv preprint arXiv:1906.01502.
    Findings
  • Matt Post. 2018. A call for clarity in reporting bleu scores. arXiv preprint arXiv:1804.08771.
    Findings
  • Ratish Puduppully, Li Dong, and Mirella Lapata. 2019. Data-to-text generation with content selection and planning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 6908–6915.
    Google ScholarLocate open access versionFindings
  • Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI Blog, 1(8).
    Google ScholarLocate open access versionFindings
  • Sebastian Schuster, Sonal Gupta, Rushin Shah, and Mike Lewis. 2018. Cross-lingual transfer learning for multilingual task oriented dialog. arXiv preprint arXiv:1810.13327.
    Findings
  • Holger Schwenk, Vishrav Chaudhary, Shuo Sun, Hongyu Gong, and Francisco Guzman. 2019a. Wikimatrix: Mining 135m parallel sentences in 1620 language pairs from wikipedia. arXiv preprint arXiv:1907.05791.
    Findings
  • Holger Schwenk, Guillaume Wenzek, Sergey Edunov, Edouard Grave, and Armand Joulin. 2019b. Ccmatrix: Mining billions of high-quality parallel sentences on the web. arXiv preprint arXiv:1911.04944.
    Findings
  • Jonathan Shen, Patrick Nguyen, Yonghui Wu, Zhifeng Chen, Mia X Chen, Ye Jia, Anjuli Kannan, Tara Sainath, Yuan Cao, Chung-Cheng Chiu, et al. 2019. Lingvo: a modular and scalable framework for sequence-to-sequence modeling. arXiv preprint arXiv:1902.08295.
    Findings
  • Anastasia Shimorina and Claire Gardent. 2018. Handling rare items in data-to-text generation. In Proceedings of the 11th International Conference on Natural Language Generation, pages 360–370.
    Google ScholarLocate open access versionFindings
  • Aditya Siddhant, Anuj Goyal, and Angeliki Metallinou. 2019a. Unsupervised transfer learning for spoken language understanding in intelligent agents. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 4959–4966.
    Google ScholarLocate open access versionFindings
  • Aditya Siddhant, Melvin Johnson, Henry Tsai, Naveen Arivazhagan, Jason Riesa, Ankur Bapna, Orhan Firat, and Karthik Raman. 2019b. Evaluating the cross-lingual effectiveness of massively multilingual neural machine translation. arXiv preprint arXiv:1909.00437.
    Findings
  • Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, and TieYan Liu. 2019. Mass: Masked sequence to sequence pre-training for language generation. arXiv preprint arXiv:1905.02450.
    Findings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008.
    Google ScholarLocate open access versionFindings
  • Ramakrishna Vedantam, C Lawrence Zitnick, and Devi Parikh. 2015. Cider: Consensus-based image description evaluation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4566–4575.
    Google ScholarLocate open access versionFindings
  • Tsung-Hsien Wen, Milica Gasic, Nikola Mrksic, PeiHao Su, David Vandyke, and Steve Young. 2015. Semantically conditioned lstm-based natural language generation for spoken dialogue systems. arXiv preprint arXiv:1508.01745.
    Findings
  • Sam Wiseman, Stuart M Shieber, and Alexander M Rush. 2017. Challenges in data-to-document generation. arXiv preprint arXiv:1707.08052.
    Findings
  • Shijie Wu and Mark Dredze. 2019.
    Google ScholarFindings
Your rating :
0

 

Tags
Comments