Slot-consistent NLG for Task-oriented Dialogue Systems with Iterative Rectification Network

ACL, pp. 97-106, 2020.

Cited by: 0|Bibtex|Views136|Links
EI
Keywords:
language generationinput slot valueslot valueslot error ratehuman evaluationMore(16+)
Weibo:
We have proposed Iterative Rectification Network to improve slot consistency of general natural language generation systems

Abstract:

Data-driven approaches using neural networks have achieved promising performances in natural language generation (NLG). However, neural generators are prone to make mistakes, e.g., neglecting an input slot value and generating a redundant slot value. Prior works refer this to hallucination phenomenon. In this paper, we study slot consiste...More

Code:

Data:

0
Introduction
  • Natural Language Generation (NLG), as a critical component of task-oriented dialogue systems, converts a meaning representation, i.e., dialogue act (DA), into natural language sentences.
  • Traditional methods (Stent et al, 2004; Konstas and Lapata, 2013; Wong and Mooney, 2007) are mostly pipeline-based, dividing the generation process into sentence planing and surface realization.
  • Despite their robustness, they heavily rely on handcrafted rules and domain-specific knowledge.
  • Dusek and Jurc ́ıcek (2016) employ an attentive encoder-decoder model, which applies attention mechanism over input slot value pairs
Highlights
  • Natural Language Generation (NLG), as a critical component of task-oriented dialogue systems, converts a meaning representation, i.e., dialogue act (DA), into natural language sentences
  • We propose Iterative Rectification Network (IRN) to improve slot consistency for general natural language generation systems
  • We employ policy-based reinforcement learning to enable training the models with discrete rewards that are consistent to evaluation metrics
  • Extensive experiments show that the proposed model significantly outperforms previous methods. These improvements include both of correctness measured with slot error rates and naturalness measured with BLEU scores
Methods
  • IRN (+KNN) w/o IRN w/o reward rSC w/o reward rDS w/o reward rLM w/o baseline BLEU w/o Aggregation w/o Bootstrapping

    The authors follow all baseline performances reported in (Tran and Nguyen, 2017b) and use open source toolkits, RNNLG1 and Tgen2 to build NLG systems, HLSTM, SCLSTM and TGen.
  • The authors reimplement the baselines ARoA and RALSTM since their source codes are not available
Results
  • The authors first compare the model, i.e., IRN + KNN with all those strong baselines metioned above.
  • Compared with current state-of-the-art model, RALSTM, it achieves reductions of 1.45, 1.38, 1.45 and 1.80 times for SF Restaurant, SF Hotel, Laptop, and Television datasets, respectively
  • It improves 3.59%, 1.45%, 2.29% and 3.33% of BLEU scores on these datasets, respectively.
  • To verify whether IRN helps improve slot consistency of general NLG models, the authors further equip strong baselines, including HLSTM, TGen and RALSTM, with IRN
  • The authors evaluate their performances on SF Restaurant and Television datasets.
  • As shown in Table 3, the methods consistently reduce ERRs and improve BLEU scores for all
Conclusion
  • The authors have proposed Iterative Rectification Network (IRN) to improve slot consistency of general NLG systems.
  • In this method, a retrieval-based bootstrapping is introduced to sample pseudo mistaken cases from training corpus to enrich the original training data.
  • Extensive experiments show that the proposed model significantly outperforms previous methods.
  • These improvements include both of correctness measured with slot error rates and naturalness measured with BLEU scores.
  • Human evaluation and case study confirm the effectiveness of the proposed method
Summary
  • Introduction:

    Natural Language Generation (NLG), as a critical component of task-oriented dialogue systems, converts a meaning representation, i.e., dialogue act (DA), into natural language sentences.
  • Traditional methods (Stent et al, 2004; Konstas and Lapata, 2013; Wong and Mooney, 2007) are mostly pipeline-based, dividing the generation process into sentence planing and surface realization.
  • Despite their robustness, they heavily rely on handcrafted rules and domain-specific knowledge.
  • Dusek and Jurc ́ıcek (2016) employ an attentive encoder-decoder model, which applies attention mechanism over input slot value pairs
  • Methods:

    IRN (+KNN) w/o IRN w/o reward rSC w/o reward rDS w/o reward rLM w/o baseline BLEU w/o Aggregation w/o Bootstrapping

    The authors follow all baseline performances reported in (Tran and Nguyen, 2017b) and use open source toolkits, RNNLG1 and Tgen2 to build NLG systems, HLSTM, SCLSTM and TGen.
  • The authors reimplement the baselines ARoA and RALSTM since their source codes are not available
  • Results:

    The authors first compare the model, i.e., IRN + KNN with all those strong baselines metioned above.
  • Compared with current state-of-the-art model, RALSTM, it achieves reductions of 1.45, 1.38, 1.45 and 1.80 times for SF Restaurant, SF Hotel, Laptop, and Television datasets, respectively
  • It improves 3.59%, 1.45%, 2.29% and 3.33% of BLEU scores on these datasets, respectively.
  • To verify whether IRN helps improve slot consistency of general NLG models, the authors further equip strong baselines, including HLSTM, TGen and RALSTM, with IRN
  • The authors evaluate their performances on SF Restaurant and Television datasets.
  • As shown in Table 3, the methods consistently reduce ERRs and improve BLEU scores for all
  • Conclusion:

    The authors have proposed Iterative Rectification Network (IRN) to improve slot consistency of general NLG systems.
  • In this method, a retrieval-based bootstrapping is introduced to sample pseudo mistaken cases from training corpus to enrich the original training data.
  • Extensive experiments show that the proposed model significantly outperforms previous methods.
  • These improvements include both of correctness measured with slot error rates and naturalness measured with BLEU scores.
  • Human evaluation and case study confirm the effectiveness of the proposed method
Tables
  • Table1: An exmaple (including mistaken generations) extracted from SF Hotel (<a class="ref-link" id="cWen_et+al_2015_b" href="#rWen_et+al_2015_b">Wen et al, 2015b</a>) dataset. Errors are marked in colors (missing, misplaced)
  • Table2: Experiment results on four datasets for all baselines and our model. Meanwhile, the improvements over all prior methods are statistically significant with p < 0.01 under t-test
  • Table3: The up and down arrows emphasize the absolutely improved performances contributed by IRN
  • Table4: Ablation study of rewards (upper part) and training data algorithms (lower part)
  • Table5: Real user trial for generation quality evaluation on both informativeness and naturalness
  • Table6: A DA from Television dataset and a candidate from HLSTM on the DA. The output template from each iteration of IRN. Slot errors are marked in colors (missing, misplaced)
Download tables as Excel
Related work
  • Conventional approaches for solving NLG task are mostly pipeline-based, dividing it into sentence planning and surface realisation (Dethlefs et al, 2013; Stent et al, 2004; Walker et al, 2002). Oh and Rudnicky (2000) introduce a class-based ngram language model and a rule-based reranker. Ratnaparkhi (2002) address the limitations of ngram language models by using more sophisticated syntactic dependency trees. Mairesse and Young (2014) employ a phrase-based generator that learn from a semantically aligned corpus. Despite their robustness, these models are costly to create and maintain as they heavily rely on handcrafted rules.

    Recent works (Wen et al, 2015b; Dusek and Jurc ́ıcek, 2016; Tran and Nguyen, 2017a) build data-driven models based on end-to-end learning. Wen et al (2015a) combine two recurrent neural network (RNN) based models with a CNN reranker to generate required utterances. Wen et al (2015b) introduce a novel SC-LSTM with an additional reading cell to jointly learn gating mechanism and language model. Dusek and Jurc ́ıcek (2016) present an attentive neural generator to apply attention mechanism over input DA. Tran and Nguyen (2017b,a) employ a refiner component to select and aggregate the semantic elements produced by the encoder. More recently, domain adaptation (Wen et al, 2016) and unsupervised learning (Bahuleyan et al, 2018) for NLG also receive much attention.
Funding
  • This work was supported by the National Natural Science Foundation of China (NSFC) via grant 61976072, 61632011 and 61772153
Reference
  • Hareesh Bahuleyan, Lili Mou, Kartik Vamaraju, Hao Zhou, and Olga Vechtomova. 2018. Probabilistic natural language generation with wasserstein autoencoders. arXiv preprint arXiv:1806.08462.
    Findings
  • Anusha Balakrishnan, Jinfeng Rao, Kartikeya Upasani, Michael White, and Rajen Subba. 2019. Constrained decoding for neural NLG from compositional representations in task-oriented dialogue. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. To appear.
    Google ScholarLocate open access versionFindings
  • Nina Dethlefs, Helen Hastie, Heriberto Cuayahuitl, and Oliver Lemon. 201Conditional random fields for responsive surface realisation using global features. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1254–1263, Sofia, Bulgaria. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Ondrej Dusek and Filip Jurcıcek. 2016. Sequenceto-sequence generation for spoken dialogue via deep syntax trees and strings. arXiv preprint arXiv:1606.05491.
    Findings
  • Amanda Stent, Rashmi Prasad, and Marilyn Walker. 2004. Trainable sentence planning for complex information presentations in spoken dialog systems. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL04), pages 79–86, Barcelona, Spain.
    Google ScholarLocate open access versionFindings
  • Van-Khanh Tran and Le-Minh Nguyen. 2017a. Natural language generation for spoken dialogue system using rnn encoder-decoder networks. arXiv preprint arXiv:1706.00139.
    Findings
  • Van-Khanh Tran and Le-Minh Nguyen. 2017b. Neuralbased natural language generation in dialogue using rnn encoder-decoder with semantic aggregation. arXiv preprint arXiv:1706.06714.
    Findings
  • Marilyn A Walker, Owen C Rambow, and Monica Rogati. 2002. Training a sentence planner for spoken dialogue using boosting. Computer Speech & Language, 16(3-4):409–433.
    Google ScholarLocate open access versionFindings
  • Lex Weaver and Nigel Tao. 2001. The optimal reward baseline for gradient-based reinforcement learning. In Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence, pages 538–545. Morgan Kaufmann Publishers Inc.
    Google ScholarLocate open access versionFindings
  • Tsung-Hsien Wen, Milica Gasic, Dongho Kim, Nikola Mrksic, Pei-Hao Su, David Vandyke, and Steve Young. 2015a. Stochastic language generation in dialogue using recurrent neural networks with convolutional sentence reranking. arXiv preprint arXiv:1508.01755.
    Findings
  • Tsung-Hsien Wen, Milica Gasic, Nikola Mrksic, Lina M Rojas-Barahona, Pei-Hao Su, David Vandyke, and Steve Young. 2016. Multi-domain neural network language generation for spoken dialogue systems. arXiv preprint arXiv:1603.01232.
    Findings
  • Tsung-Hsien Wen, Milica Gasic, Nikola Mrksic, PeiHao Su, David Vandyke, and Steve Young. 2015b. Semantically conditioned lstm-based natural language generation for spoken dialogue systems. arXiv preprint arXiv:1508.01745.
    Findings
  • Ronald J. Williams. 1992. Simple statistical gradientfollowing algorithms for connectionist reinforcement learning. Machine Learning, 8:229–256.
    Google ScholarLocate open access versionFindings
  • Yuk Wah Wong and Raymond Mooney. 2007. Generation by inverting a semantic parser that uses statistical machine translation. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference, pages 172–179.
    Google ScholarLocate open access versionFindings
  • Chien-Sheng Wu, Richard Socher, and Caiming Xiong. 2019. Global-to-local memory pointer networks for task-oriented dialogue. In International Conference on Learning Representations.
    Google ScholarLocate open access versionFindings
  • Yingce Xia, Fei Tian, Lijun Wu, Jianxin Lin, Tao Qin, Nenghai Yu, and Tie-Yan Liu. 2017. Deliberation networks: Sequence generation beyond one-pass decoding. In Advances in Neural Information Processing Systems, pages 1784–1794.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments