Dialogue State Induction Using Neural Latent Variable Models

IJCAI 2020, pp. 3845-3852, 2020.

Cited by: 0|Bibtex|Views37|Links
EI
Keywords:
neural latent variable modelevidence lower bounddialogue recorduser utterancecustomer serviceMore(15+)
Weibo:
We proposed a novel task of dialogue state induction, which is to automatically identify dialogue state slots and values over a large set of dialogue records

Abstract:

Dialogue state modules are a useful component in a task-oriented dialogue system. Traditional methods find dialogue states by manually labeling training corpora, upon which neural models are trained. However, the labeling process can be costly, slow, error-prone, and more importantly, cannot cover the vast range of domains in real-world...More
0
Introduction
  • Dialogue state modules are a central component to a taskoriented dialogue system [Wen et al, 2017; Lei et al, 2018].
  • The dialogue state becomes inform; request, where inform represents the search constraints expressed by user and request represents the search target that the user is asking for.
  • In this example, the user intention is to reserve a restaurant.
  • The dialogue state represents what the user is looking for at the current turn of the conversation
Highlights
  • Dialogue state modules are a central component to a taskoriented dialogue system [Wen et al, 2017; Lei et al, 2018]
  • An example is shown in Figure 1, given two turns of a dialogue, the first user utterance is “I want an expensive restaurant that serves Turkish food.”, and the dialogue states consist of the slot-value pairs inform
  • We introduce two neural latent variable models for dialogue state induction by treating the whole state and each slot as latent variables, from which values observed in dialogue data are generated
  • The joint goal accuracy is significantly lower compared with the other metrics, which shows that the metric can be overly strict in our unsupervised setting. This is a consistent observation of recent work on cross-lingual dialogue state tracking [Liu et al, 2019b], which shows that the joint goal accuracy of a cross-lingual Dialogue State Tracking model can be as low as 11% on accuracy even with crosslingual contextualized embeddings
  • We proposed a novel task of dialogue state induction, which is to automatically identify dialogue state slots and values over a large set of dialogue records
  • Results on standard Dialogue State Tracking datasets show that the models can effectively induce meaningful dialogue states from raw dialogue data, and further improve the results of a dialogue system compared to without using dialogue states
Methods
  • The closest in spirit of the work, Chen et al [2013] used the FrameNet-style frame-semantic parsers to induce slots from a user utterance; Shi et al [2018] proposed a framework auto-dialabel to cluster noun words into slots.
  • The authors' work is different in two main aspects.
  • Given the utterance “The author would like a guesthouse rather than a star hotel.”, the user intent is to book a hotel, the slots include hotel type=guesthouse and hotel type=star hotel, and the dialogue state is inform.
  • Given the sentence “The author wants a flight from Chicago to Dallas”, the user intent is to book a flight, the slots include city=Chicago and city=Dallas, and the dialogue state is inform.
  • The authors consider a deep neural model with hidden variables and contextualized embeddings, which adapts better to the multi-domain scenario
Results
  • DSI Performance The DSI results are shown in Table 1.
  • The joint goal F1-score can reach 44.8% on MultiWOZ dataset, which shows that the model can achieve promising performance without any labeled training data
Conclusion
  • The authors proposed a novel task of dialogue state induction, which is to automatically identify dialogue state slots and values over a large set of dialogue records.
  • The authors' task is practically more useful for handling the large variety of services available in the industry, which disallows scalable manual labeling of dialogue states.
  • Results on standard DST datasets show that the models can effectively induce meaningful dialogue states from raw dialogue data, and further improve the results of a dialogue system compared to without using dialogue states.
  • The authors' methods can serve as baselines for further research on the task
Summary
  • Introduction:

    Dialogue state modules are a central component to a taskoriented dialogue system [Wen et al, 2017; Lei et al, 2018].
  • The dialogue state becomes inform; request, where inform represents the search constraints expressed by user and request represents the search target that the user is asking for.
  • In this example, the user intention is to reserve a restaurant.
  • The dialogue state represents what the user is looking for at the current turn of the conversation
  • Methods:

    The closest in spirit of the work, Chen et al [2013] used the FrameNet-style frame-semantic parsers to induce slots from a user utterance; Shi et al [2018] proposed a framework auto-dialabel to cluster noun words into slots.
  • The authors' work is different in two main aspects.
  • Given the utterance “The author would like a guesthouse rather than a star hotel.”, the user intent is to book a hotel, the slots include hotel type=guesthouse and hotel type=star hotel, and the dialogue state is inform.
  • Given the sentence “The author wants a flight from Chicago to Dallas”, the user intent is to book a flight, the slots include city=Chicago and city=Dallas, and the dialogue state is inform.
  • The authors consider a deep neural model with hidden variables and contextualized embeddings, which adapts better to the multi-domain scenario
  • Results:

    DSI Performance The DSI results are shown in Table 1.
  • The joint goal F1-score can reach 44.8% on MultiWOZ dataset, which shows that the model can achieve promising performance without any labeled training data
  • Conclusion:

    The authors proposed a novel task of dialogue state induction, which is to automatically identify dialogue state slots and values over a large set of dialogue records.
  • The authors' task is practically more useful for handling the large variety of services available in the industry, which disallows scalable manual labeling of dialogue states.
  • Results on standard DST datasets show that the models can effectively induce meaningful dialogue states from raw dialogue data, and further improve the results of a dialogue system compared to without using dialogue states.
  • The authors' methods can serve as baselines for further research on the task
Tables
  • Table1: Overall results of DSI
  • Table2: Hyper-parameters settings
  • Table3: Empirical results on MultiWOZ dialogue act prediction and response generation
  • Table4: Turn goal accuracy per domain
Download tables as Excel
Related work
  • The role of DST and DSI in task-oriented dialogue systems. Task-oriented dialogue systems are complex, traditionally involving a pipeline of multiple steps, including automatic speech recognition (ASR) [Wen et al, 2017], spoken language understanding (SLU) [Qin et al, 2019], dialogue state tracking (DST) [Zhong et al, 2018], policy learning and natural language generation (NLG) [Chen et al, 2019]. SLU consistes of two main sub-tasks, namely intent detection, which is to identify the user intent such as hotel booking, and slot tagging, which is to identify relevant semantic slots in a user utterance, such as price and stars. Dialogue state tracking aims to identify user goals at every turn of the dialogue, such as inform(price=moderate, stars=4); request(phone), which makes the core component in a task-oriented dialogue system. Policy learning aims to learn the system action based on the current state. Natural language generation transforms the system action into natural language.
Funding
  • We would like to acknowledge funding support from National Natural Science Foundation of China under Grant No.61976180 and the Westlake University and Bright Dream Joint Institute for Intelligent Robotics
Reference
  • [Budzianowski et al., 2018] Paweł Budzianowski, TsungHsien Wen, Bo-Hsiang Tseng, Inigo Casanueva, Stefan Ultes, Osman Ramadan, and Milica Gasic. MultiWOZ a large-scale multi-domain wizard-of-Oz dataset for taskoriented dialogue modelling. In EMNLP, 2018.
    Google ScholarLocate open access versionFindings
  • [Chen et al., 2013] Yun-Nung Chen, William Yang Wang, and Alexander I Rudnicky. Unsupervised induction and filling of semantic slots for spoken dialogue systems using frame-semantic parsing. In ASRU Workshop, 2013.
    Google ScholarFindings
  • [Chen et al., 2019] Wenhu Chen, Jianshu Chen, Pengda Qin, Xifeng Yan, and William Yang Wang. Semantically conditioned dialog response generation via hierarchical disentangled self-attention. In ACL, 2019.
    Google ScholarLocate open access versionFindings
  • [Cui and Zhang, 2019] Leyang Cui and Yue Zhang. Hierarchically-refined label attention network for sequence labeling. In EMNLP, 2019.
    Google ScholarLocate open access versionFindings
  • [Eric and Manning, 2017] Mihail Eric and Christopher D Manning. A copy-augmented sequence-to-sequence architecture gives good performance on task-oriented dialogue. In EACL, 2017.
    Google ScholarLocate open access versionFindings
  • [Eric et al., 2019] Mihail Eric, Rahul Goel, Shachi Paul, Abhishek Sethi, Sanchit Agarwal, Shuyag Gao, and Dilek Hakkani-Tur. Multiwoz 2.1: Multi-domain dialogue state corrections and state tracking baselines. arXiv, 2019.
    Google ScholarFindings
  • [Goel et al., 2019] Rahul Goel, Shachi Paul, and Dilek Hakkani-Tur. Hyst: A hybrid approach for flexible and accurate dialogue state tracking. In Interspeech, 2019.
    Google ScholarLocate open access versionFindings
  • [Hemphill et al., 1990] Charles T Hemphill, John J Godfrey, and George R Doddington. The atis spoken language systems pilot corpus. In HLT, 1990.
    Google ScholarLocate open access versionFindings
  • [Jiang et al., 2017] Zhuxi Jiang, Yin Zheng, Huachun Tan, Bangsheng Tang, and Hanning Zhou. Variational deep embedding: An unsupervised and generative approach to clustering. In IJCAI, 2017.
    Google ScholarLocate open access versionFindings
  • [Kingma and Welling, 2014] Diederik P. Kingma and Max Welling. Auto-encoding variational bayes. In ICLR, 2014.
    Google ScholarLocate open access versionFindings
  • [Lei et al., 2018] Wenqiang Lei, Xisen Jin, Min-Yen Kan, Zhaochun Ren, Xiangnan He, and Dawei Yin. Sequicity: Simplifying task-oriented dialogue systems with single sequence-to-sequence architectures. In ACL, 2018.
    Google ScholarLocate open access versionFindings
  • [Liu et al., 2019a] Xiao Liu, Heyan Huang, and Yue Zhang. Open domain event extraction using neural latent variable models. In ACL, 2019.
    Google ScholarLocate open access versionFindings
  • [Liu et al., 2019b] Zihan Liu, Genta Indra Winata, Zhaojiang Lin, Peng Xu, and Pascale Fung. Attention-informed mixed-language training for zero-shot cross-lingual taskoriented dialogue systems. In AAAI, 2019.
    Google ScholarLocate open access versionFindings
  • [Mrksicet al., 2015] Nikola Mrksic, Diarmuid O Seaghdha, Blaise Thomson, Milica Gasic, Pei-Hao Su, David Vandyke, Tsung-Hsien Wen, and Steve Young. Multidomain dialog state tracking using recurrent neural networks. In ACL, 2015.
    Google ScholarLocate open access versionFindings
  • [Qin et al., 2019] Libo Qin, Wanxiang Che, Yangming Li, Haoyang Wen, and Ting Liu. A stack-propagation framework with token-level intent detection for spoken language understanding. In EMNLP, 2019.
    Google ScholarLocate open access versionFindings
  • [Qin et al., 2020] Libo Qin, Xiao Xu, Wanxiang Che, Yue Zhang, and Ting Liu. Dynamic Fusion Network for MultiDomain End-to-end Task-Oriented Dialog. arXiv, 2020.
    Google ScholarFindings
  • [Rastogi et al., 2019] Abhinav Rastogi, Xiaoxue Zang, Srinivas Sunkara, Raghav Gupta, and Pranav Khaitan. Towards scalable multi-domain conversational agents: The schema-guided dialogue dataset. In AAAI, 2019.
    Google ScholarLocate open access versionFindings
  • [Ren et al., 2019] Liliang Ren, Jianmo Ni, and Julian McAuley. Scalable and accurate dialogue state tracking via hierarchical sequence generation. In EMNLP, 2019.
    Google ScholarLocate open access versionFindings
  • [Shi et al., 2018] Chen Shi, Qi Chen, Lei Sha, Sujian Li, Xu Sun, Houfeng Wang, and Lintao Zhang. Auto-dialabel: Labeling dialogue data with unsupervised learning. In EMNLP, 2018.
    Google ScholarLocate open access versionFindings
  • [Wen et al., 2017] Tsung-Hsien Wen, David Vandyke, Nikola Mrksic, Milica Gasic, Lina M Rojas Barahona, Pei-Hao Su, Stefan Ultes, and Steve Young. A networkbased end-to-end trainable task-oriented dialogue system. In EACL, 2017.
    Google ScholarLocate open access versionFindings
  • [Wen et al., 2018] Haoyang Wen, Yijia Liu, Wanxiang Che, Libo Qin, and Ting Liu. Sequence-to-sequence learning for task-oriented dialogue with dialogue state representation. In COLING, 2018.
    Google ScholarLocate open access versionFindings
  • [Williams et al., 2014] Jason D Williams, Matthew Henderson, Antoine Raux, Blaise Thomson, Alan Black, and Deepak Ramachandran. The dialog state tracking challenge series. AI Magazine, 2014.
    Google ScholarLocate open access versionFindings
  • [Wu et al., 2019a] Chien-Sheng Wu, Andrea Madotto, Ehsan Hosseini-Asl, Caiming Xiong, Richard Socher, and Pascale Fung. Transferable multi-domain state generator for task-oriented dialogue systems. In ACL, 2019.
    Google ScholarLocate open access versionFindings
  • [Wu et al., 2019b] Chien-Sheng Wu, Richard Socher, and Caiming Xiong. Global-to-local memory pointer networks for task-oriented dialogue. In ICLR, 2019.
    Google ScholarLocate open access versionFindings
  • [Xing et al., 2003] Eric P Xing, Michael I Jordan, and Stuart Russell. A generalized mean field algorithm for variational inference in exponential families. In UAI, 2003.
    Google ScholarFindings
  • [Young et al., 2010] Steve Young, Milica Gasic, Simon Keizer, Francois Mairesse, Jost Schatzmann, Blaise Thomson, and Kai Yu. The hidden information state model: A practical framework for pomdp-based spoken dialogue management. CSL, 2010.
    Google ScholarLocate open access versionFindings
  • [Zhang et al., 2019] Jian-Guo Zhang, Kazuma Hashimoto, Chien-Sheng Wu, Yao Wan, Philip S Yu, Richard Socher, and Caiming Xiong. Find or classify? dual strategy for slot-value predictions on multi-domain dialog state tracking. arXiv, 2019.
    Google ScholarFindings
  • [Zhong et al., 2018] Victor Zhong, Caiming Xiong, and Richard Socher. Global-locally self-attentive encoder for dialogue state tracking. In ACL, 2018.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments