Dialog State Tracking with Reinforced Data Augmentation

national conference on artificial intelligence, 2020.

Cited by: 4|Bibtex|Views113|Links
Keywords:
neural networkDialog state trackinghigh qualitydeep reinforcement learningcontextual banditMore(10+)
Weibo:
We have proposed a reinforced data augmentation method for dialogue state tracking in order to improve its performance by generating high-quality training data

Abstract:

Neural dialog state trackers are generally limited due to the lack of quantity and diversity of annotated training data. In this paper, we address this difficulty by proposing a reinforcement learning (RL) based framework for data augmentation that can generate high-quality data to improve the neural state tracker. Specifically, we intr...More

Code:

Data:

0
Introduction
  • With the increasing popularity of intelligent assistants such as Alexa, Siri and Google Duplex, the research on spoken dialog systems has gained a great deal of attention in recent years (Gao et al, 2018).
  • The dialog agent decides how to converse with the user.
  • In a slot-based dialog system, the dialogue states are typically formulated as a set of slot-value pairs and one concrete example is as follows: User: Grandma wants Italian, any suggestions?
  • State: inform(food=Italian) Agent: Would you prefer south or center?
  • State: inform(food=Italian, price=cheap, area=don’t care)
Highlights
  • With the increasing popularity of intelligent assistants such as Alexa, Siri and Google Duplex, the research on spoken dialog systems has gained a great deal of attention in recent years (Gao et al, 2018)
  • The framework reinforced data augmentation (RDA) can further boost the performances of the competitive GLAD by the margin of 2.4% and 3.1% on two datasets respectively, achieving new stateof-the-art results (90.7% and 86.7%)
  • We conduct significance test (t-test), and the results show that the proposed RDA achieves significant improvements over baseline models (p < 0.01 and p < 0.05 respectively for WoZ and MultiWoZ)
  • We conduct experiments with the GLAD tracker which is evaluated on the validation set of WoZ and the joint goal accuracy is used as the evaluation metric
  • We have proposed a reinforced data augmentation (RDA) method for dialogue state tracking in order to improve its performance by generating high-quality training data
  • The results show that our model consistently outperforms the strong baselines and achieves new state-of-the-art results
  • The Generator and the Tracker are learned in an alternate manner, i.e. the Generator is learned based on rewards from the Tracker while the Tracker is re-trained and boosted with the new high-quality data augmented by the Generator
Methods
  • The authors compare the model with some baselines. Delexicalised Model uses generic tags to replace the slot values and employs a CNN for turn-level feature extraction and a Jordan RNN for state updates (Henderson et al, 2014b; Wen et al, 2017).
  • NBT-DNN and NBT-CNN respectively use the summation and convolution filters to learn the representations for the user utterance, candidate slot-value pair and the system actions (Mrksicet al., 2017).
  • They fuse these representations by a gating mechanism for the final prediction.
  • GCE is developed based on GLAD by using global recurrent networks rather than the global-local modules (Nouri and Hosseini-Asl, 2018)
Results
  • Results and Analyses

    The authors compare the model with baselines and the joint goal accuracy is used as the evaluation metric.
  • The authors observe that the proposed GLAD achieves comparable performances (88.3% and 83.6%) with other state-of-the-art models on both datasets.
  • The framework RDA can further boost the performances of the competitive GLAD by the margin of 2.4% and 3.1% on two datasets respectively, achieving new stateof-the-art results (90.7% and 86.7%).
  • The authors conduct significance test (t-test), and the results show that the proposed RDA achieves significant improvements over baseline models (p < 0.01 and p < 0.05 respectively for WoZ and MultiWoZ)
Conclusion
  • The authors have proposed a reinforced data augmentation (RDA) method for dialogue state tracking in order to improve its performance by generating high-quality training data.
  • The Generator and the Tracker are learned in an alternate manner, i.e. the Generator is learned based on rewards from the Tracker while the Tracker is re-trained and boosted with the new high-quality data augmented by the Generator.
  • The authors would conduct experiments on more NLP tasks and introduce neural network based paraphrasing method in the RDA framework
Summary
  • Introduction:

    With the increasing popularity of intelligent assistants such as Alexa, Siri and Google Duplex, the research on spoken dialog systems has gained a great deal of attention in recent years (Gao et al, 2018).
  • The dialog agent decides how to converse with the user.
  • In a slot-based dialog system, the dialogue states are typically formulated as a set of slot-value pairs and one concrete example is as follows: User: Grandma wants Italian, any suggestions?
  • State: inform(food=Italian) Agent: Would you prefer south or center?
  • State: inform(food=Italian, price=cheap, area=don’t care)
  • Methods:

    The authors compare the model with some baselines. Delexicalised Model uses generic tags to replace the slot values and employs a CNN for turn-level feature extraction and a Jordan RNN for state updates (Henderson et al, 2014b; Wen et al, 2017).
  • NBT-DNN and NBT-CNN respectively use the summation and convolution filters to learn the representations for the user utterance, candidate slot-value pair and the system actions (Mrksicet al., 2017).
  • They fuse these representations by a gating mechanism for the final prediction.
  • GCE is developed based on GLAD by using global recurrent networks rather than the global-local modules (Nouri and Hosseini-Asl, 2018)
  • Results:

    Results and Analyses

    The authors compare the model with baselines and the joint goal accuracy is used as the evaluation metric.
  • The authors observe that the proposed GLAD achieves comparable performances (88.3% and 83.6%) with other state-of-the-art models on both datasets.
  • The framework RDA can further boost the performances of the competitive GLAD by the margin of 2.4% and 3.1% on two datasets respectively, achieving new stateof-the-art results (90.7% and 86.7%).
  • The authors conduct significance test (t-test), and the results show that the proposed RDA achieves significant improvements over baseline models (p < 0.01 and p < 0.05 respectively for WoZ and MultiWoZ)
  • Conclusion:

    The authors have proposed a reinforced data augmentation (RDA) method for dialogue state tracking in order to improve its performance by generating high-quality training data.
  • The Generator and the Tracker are learned in an alternate manner, i.e. the Generator is learned based on rewards from the Tracker while the Tracker is re-trained and boosted with the new high-quality data augmented by the Generator.
  • The authors would conduct experiments on more NLP tasks and introduce neural network based paraphrasing method in the RDA framework
Tables
  • Table1: Comparison of our model and other baselines. DA refers the coarse-grained data augmentation without the reinforced framework, and Multi refers the dataset MultiWoZ (restaurant). t-test is conducted in our proposed models and original trackers (NBTCNN and GLAD ) are used as the comparison baselines. † and ‡: significant over the baseline trackers at 0.05/0.01. The mean and the standard deviation are also reported
  • Table2: The results with different sub-sampling ratios on WoZ and MultiWoZ (restaurant)
  • Table3: Ablation study of performances on the test set of WoZ and MultiWoZ
  • Table4: Case study for the Generator policy. The phrases with maximum policy values are listed at the first line in each cell of Candidates Cp and the ones with minimum values are listed at the second line
Download tables as Excel
Related work
  • Dialog State Tracking. DST is studied extensively in the literature (Williams et al, 2016). The

    Sentence x and text span p

    Thanks , [could you give] me the phone number for the restaurant?

    What restaurants are on the east side that are not [overpriced] ?

    What is a affordable restaurant in the [south side part] of town?

    I want Cuban food and i [do n’t care] about the price range.

    Candidates Cp i was wonder if you could provide are you able to too expensive cheap enough south end southern countries do n’t worry do n’t give a danm methods can be classified into three categories: rule-based (Zue et al, 2000), generative (DeVault and Stone, 2007; Williams, 2008), and discriminative (Metallinou et al, 2013) methods. The discriminative methods (Metallinou et al, 2013) study dialog state tracking as a classification problem, designing a large number of features and optimizing the model parameters by the annotated data. Recently, neural networks based models with different architectures have been applied in DST (Henderson et al, 2014b; Zhong et al, 2018). These models initially employ CNN (Wen et al, 2017), RNN (Ramadan et al, 2018), selfattention (Nouri and Hosseini-Asl, 2018) to learn the representations for the user utterance and the system actions/response, then various gating mechanisms (Ramadan et al, 2018) are used to fuse the learned representations for prediction. Another difference among these neural models is the way of parameter sharing, most of which use one shared global encoder for representation learning, while the work (Zhong et al, 2018) pairs each slot with a local encoder in addition to one shared global encoder. Although these neural network based trackers obtain state-of-the-art results, they are still limited by insufficient amount and diversity of annotated data. To address this difficulty, we propose a method of data augmentation to improve neural state trackers by adding highquality generated instances as new training data.
Reference
  • Paweł Budzianowski, Tsung-Hsien Wen, Bo-Hsiang Tseng, Inigo Casanueva, Stefan Ultes, Osman Ramadan, and Milica Gasic. 2018. Multiwoz-a largescale multi-domain wizard-of-oz dataset for taskoriented dialogue modelling. In EMNLP, pages 5016–5026.
    Google ScholarLocate open access versionFindings
  • Asli Celikyilmaz, Antoine Bosselut, Xiaodong He, and Yejin Choi. 2018. Deep communicating agents for abstractive summarization. In NAACL, volume 1, pages 1662–1675.
    Google ScholarLocate open access versionFindings
  • Ekin D Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, and Quoc V Le. 2018. Autoaugment: Learning augmentation policies from data. arXiv preprint arXiv:1805.09501.
    Findings
  • David DeVault and Matthew Stone. 2007. Managing ambiguities across utterances in dialogue. In Decalog, pages 49–56.
    Google ScholarLocate open access versionFindings
  • Yue Dong, Yikang Shen, Eric Crawford, Herke van Hoof, and Jackie Chi Kit Cheung. 2018. Banditsum: Extractive summarization as a contextual bandit. In EMNLP, pages 3739–3748.
    Google ScholarLocate open access versionFindings
  • Miroslav Dudik, Daniel Hsu, Satyen Kale, Nikos Karampatziakis, John Langford, Lev Reyzin, and Tong Zhang. 2011. Efficient optimal learning for contextual bandits. arXiv preprint arXiv:1106.2369.
    Findings
  • Jun Feng, Minlie Huang, Li Zhao, Yang Yang, and Xiaoyan Zhu. 2018. Reinforcement learning for relation classification from noisy data.
    Google ScholarFindings
  • Jianfeng Gao, Michel Galley, and Lihong Li. 201Neural approaches to conversational ai. arXiv preprint arXiv:1809.08267.
    Findings
  • Bo Han, Quanming Yao, Xingrui Yu, Gang Niu, Miao Xu, Weihua Hu, Ivor Tsang, and Masashi Sugiyama. 2018. Co-teaching: Robust training of deep neural networks with extremely noisy labels. In NeurIPS, pages 8527–8537.
    Google ScholarLocate open access versionFindings
  • Kazuma Hashimoto, caiming xiong, Yoshimasa Tsuruoka, and Richard Socher. 2017. A joint many-task model: Growing a neural network for multiple nlp tasks. In EMNLP, pages 1923–1933.
    Google ScholarLocate open access versionFindings
  • Matthew Henderson, Blaise Thomson, and Jason D Williams. 2014a. The second dialog state tracking challenge. In SIGDIAL, pages 263–272.
    Google ScholarLocate open access versionFindings
  • Matthew Henderson, Blaise Thomson, and Steve Young. 2014b. Word-based dialog state tracking with recurrent neural networks. In SIGDIAL, pages 292–299.
    Google ScholarLocate open access versionFindings
  • Yutai Hou, Yijia Liu, Wanxiang Che, and Ting Liu. 2018. Sequence-to-sequence data augmentation for dialogue language understanding. In COLING, pages 1234–1245.
    Google ScholarLocate open access versionFindings
  • Dongyeop Kang, Tushar Khot, Ashish Sabharwal, and Eduard Hovy. 2018. Adversarial training for textual entailment with knowledge-guided examples. In ACL.
    Google ScholarFindings
  • Diederik P Kingma and Jimmy Ba. 20Adam: A method for stochastic optimization. ICLR.
    Google ScholarLocate open access versionFindings
  • Tom Ko, Vijayaditya Peddinti, Daniel Povey, and Sanjeev Khudanpur. 2015. Audio augmentation for speech recognition. In Interspeech.
    Google ScholarFindings
  • Sosuke Kobayashi. 2018. Contextual augmentation: Data augmentation by words with paradigmatic relations. In NAACL.
    Google ScholarFindings
  • Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In NeurIPS, pages 1097– 1105.
    Google ScholarLocate open access versionFindings
  • Jiwei Li, Will Monroe, Alan Ritter, Dan Jurafsky, Michel Galley, and Jianfeng Gao. 2016. Deep reinforcement learning for dialogue generation. In EMNLP.
    Google ScholarFindings
  • Zichao Li, Xin Jiang, Lifeng Shang, and Hang Li. 2018. Paraphrase generation with deep reinforcement learning. In EMNLP, pages 3865–3878.
    Google ScholarLocate open access versionFindings
  • Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. 2017. A structured self-attentive sentence embedding. ICLR.
    Google ScholarLocate open access versionFindings
  • Angeliki Metallinou, Dan Bohus, and Jason Williams. 2013. Discriminative state tracking for spoken dialog systems. In ACL, pages 466–475.
    Google ScholarLocate open access versionFindings
  • Nikola Mrksic, Diarmuid O Seaghdha, Tsung-Hsien Wen, Blaise Thomson, and Steve Young. 2017. Neural belief tracker: Data-driven dialogue state tracking. In ACL, pages 1777–1788.
    Google ScholarLocate open access versionFindings
  • Karthik Narasimhan, Adam Yala, and Regina Barzilay. 2016. Improving information extraction by acquiring external evidence with reinforcement learning. In EMNLP, pages 2355–2365.
    Google ScholarLocate open access versionFindings
  • Elnaz Nouri and Ehsan Hosseini-Asl. 2018. Toward scalable neural dialogue state tracking model. arXiv preprint arXiv:1812.00899.
    Findings
  • Romain Paulus, Caiming Xiong, and Richard Socher. 2017. A deep reinforced model for abstractive summarization. arXiv preprint arXiv:1705.04304.
    Findings
  • Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global vectors for word representation. In EMNLP, pages 1532–1543.
    Google ScholarLocate open access versionFindings
  • Pengda Qin, XU Weiran, and William Yang Wang. 2018a. Robust distant supervision relation extraction via deep reinforcement learning. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2137–2147.
    Google ScholarLocate open access versionFindings
  • Pengda Qin, Weiran XU, and William Yang Wang. 2018b. Dsgan: Generative adversarial training for distant supervision relation extraction. In ACL.
    Google ScholarFindings
  • Osman Ramadan, Paweł Budzianowski, and Milica Gasic. 2018. Large-scale multi-domain belief tracking with knowledge sharing. In ACL, pages 432– 437.
    Google ScholarLocate open access versionFindings
  • Marc’Aurelio Ranzato, Sumit Chopra, Michael Auli, and Wojciech Zaremba. 2015. Sequence level training with recurrent neural networks. arXiv preprint arXiv:1511.06732.
    Findings
  • Avik Ray, Yilin Shen, and Hongxia Jin. 2018. Robust spoken language understanding via paraphrasing. arXiv preprint arXiv:1809.06444.
    Findings
  • Liliang Ren, Kaige Xie, Lu Chen, and Kai Yu. 2018. Towards universal dialogue state tracking. In EMNLP, pages 2780–2786.
    Google ScholarLocate open access versionFindings
  • Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2018. Semantically equivalent adversarial rules for debugging NLP models. In ACL, pages 856–865.
    Google ScholarLocate open access versionFindings
  • Sanuj Sharma, Prafulla Kumar Choubey, and Ruihong Huang. 2019. Improving dialogue state tracking by discerning the relevant context.
    Google ScholarFindings
  • Satinder P Singh, Michael J Kearns, Diane J Litman, and Marilyn A Walker. 2000. Reinforcement learning for spoken dialogue systems. In NeurIPS, pages 956–962.
    Google ScholarLocate open access versionFindings
  • Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction, pages 329– 331. MIT press.
    Google ScholarFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NeurIPS, pages 5998–6008.
    Google ScholarLocate open access versionFindings
  • Tsung-Hsien Wen, David Vandyke, Nikola Mrksic, Milica Gasic, Lina M Rojas Barahona, Pei-Hao Su, Stefan Ultes, and Steve Young. 2017. A networkbased end-to-end trainable task-oriented dialogue system. In EACL, pages 438–449.
    Google ScholarLocate open access versionFindings
  • Jason Williams, Antoine Raux, and Matthew Henderson. 2016. The dialog state tracking challenge series: A review. Dialogue & Discourse, 7(3):4–33.
    Google ScholarLocate open access versionFindings
  • Jason Williams, Antoine Raux, Deepak Ramachandran, and Alan Black. 2013. The dialog state tracking challenge. In SIGDIAL, pages 404–413.
    Google ScholarLocate open access versionFindings
  • Jason D Williams. 2008. Exploiting the asr n-best by tracking multiple dialog state hypotheses. In Interspeech.
    Google ScholarFindings
  • Jiawei Wu, Lei Li, and William Yang Wang. 2018. Reinforced co-training. In NAACL, pages 1252–1262.
    Google ScholarLocate open access versionFindings
  • Wenhan Xiong, Thien Hoang, and William Yang Wang. 2017. Deeppath: A reinforcement learning method for knowledge graph reasoning. In EMNLP, pages 564–573.
    Google ScholarLocate open access versionFindings
  • Kang Min Yoo, Youhyun Shin, and Sang-goo Lee. 2018. Data augmentation for spoken language understanding via joint variational generation. arXiv preprint arXiv:1809.02305.
    Findings
  • Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. In NeurIPS.
    Google ScholarFindings
  • Shiqi Zhao, Xiang Lan, Ting Liu, and Sheng Li. 2009. Application-driven statistical paraphrase generation. In ACL, pages 834–842.
    Google ScholarLocate open access versionFindings
  • Victor Zhong, Caiming Xiong, and Richard Socher. 2018. Global-locally self-attentive encoder for dialogue state tracking. In ACL.
    Google ScholarFindings
  • Victor Zue, Stephanie Seneff, James R Glass, Joseph Polifroni, Christine Pao, Timothy J Hazen, and Lee Hetherington. 2000. Juplter: a telephone-based conversational interface for weather information. TASLP, pages 85–96.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments