Dynamic Fusion Network for Multi-Domain End-to-end Task-Oriented Dialog

ACL, pp. 6344-6354, 2020.

Cited by: 0|Bibtex|Views121|Links
EI
Keywords:
spoken language understandingtraining datumgas stationdomain knowledgeglobal-to-local memory pointer mechanismMore(10+)
Weibo:
We investigate methods that can make explicit use of domain knowledge and introduce a shared-private network to learn shared and specific knowledge

Abstract:

Recent studies have shown remarkable success in end-to-end task-oriented dialog system. However, most neural models rely on large training data, which are only available for a certain number of task domains, such as navigation and scheduling. This makes it difficult to scalable for a new domain with limited labeled data. However, ther...More
0
Introduction
  • Task-oriented dialogue systems (Young et al, 2013) help users to achieve specific goals such as restaurant reservation or navigation inquiry.
  • End-to-end methods in the literature usually take the sequence-to-sequence (Seq2Seq) model to generate a response from a dialogue history (Eric and Manning, 2017; Eric et al, 2017; Madotto et al, 2018; Wen et al, 2018; Gangi Reddy et al, 2019; Qin et al, 2019b; Wu et al, 2019a).
  • 5672 barringer street no traffic 200 Alester Ave
  • Taking the dialogue in Figure 1 as an example, to answer the drivers query about the “gas station”, the end-to-end dialogue system directly generates system response given the query and a corresponding knowledge base (KB).
Highlights
  • Task-oriented dialogue systems (Young et al, 2013) help users to achieve specific goals such as restaurant reservation or navigation inquiry
  • To address the above issues, we further propose a novel Dynamic Fusion Network (DF-Net), which is shown in Figure 2 (d)
  • The results on the two datasets are shown in Table 2, we can observe that: 1) The basic shared-private framework outperforms the best prior model global-to-local memory pointer mechanism in all the datasets
  • Our model outperforms global-to-local memory pointer mechanism by 2.0% overall, 3.3% in the Navigate domain, 1.1% in the Weather domain and 0.6% in Schedule domain on entity F1 metric, which indicates that considering relevance between target domain input and all domains is effective for enhancing performance of each domain
  • We can see that our framework outperforms global-to-local memory pointer mechanism on all metrics, which is consistent with the automatic evaluation
  • Our model can quickly adapt to a new domain with little annotated data
Methods
  • 3.1 Datasets

    Two publicly available datasets are used in this paper, which include SMD (Eric et al, 2017) and an extension of Multi-WOZ 2.1 (Budzianowski et al, 2018) that the authors equip the corresponding KB to every dialogue.1 The detailed statistics are presented in Table 1.
  • The dropout ratio the authors use in the framework is selected from {0.1, 0.2} and the batch size from {16, 32}.
  • The authors use Adam (Kingma and Ba, 2015) to optimize the parameters in the model and adopt the suggested hyper-parameters for optimization.
  • All hyper-parameters are selected according to validation set.
  • More details about hyper-parameters can be found in Appendix
Results
  • Follow the prior work (Eric et al, 2017; Madotto et al, 2018; Wen et al, 2018; Wu et al, 2019a; Qin et al, 2019b), the authors adopt the BLEU and Micro Entity F1 metrics to evaluate model performance.
  • The results on the two datasets are shown in Table 2, the authors can observe that: 1) The basic shared-private framework outperforms the best prior model GLMP in all the datasets.
  • This indicates that the combination of domain-shared and domain-specific features can better enhance each domain performanc compared with only utilizing the implicit domain-shared features.
  • The authors can see that the framework outperforms GLMP on all metrics, which is consistent with the automatic evaluation
Conclusion
  • The authors propose to use a shared-private model to investigate explicit modeling domain knowledge for multi-domain dialog.
  • A dynamic fusion layer is proposed to dynamically capture the correlation between a target domain and all source domains.
  • Experiments on two datasets show the effectiveness of the proposed models.
  • The authors thank Min Xu, Jiapeng Li, Jieru Lin and Zhouyang Li for their insightful discussions.
  • The authors thank all anonymous reviewers for their constructive comments.
  • This work faxed the support via Westlake-BrightDreams Robotics research grant
Summary
  • Introduction:

    Task-oriented dialogue systems (Young et al, 2013) help users to achieve specific goals such as restaurant reservation or navigation inquiry.
  • End-to-end methods in the literature usually take the sequence-to-sequence (Seq2Seq) model to generate a response from a dialogue history (Eric and Manning, 2017; Eric et al, 2017; Madotto et al, 2018; Wen et al, 2018; Gangi Reddy et al, 2019; Qin et al, 2019b; Wu et al, 2019a).
  • 5672 barringer street no traffic 200 Alester Ave
  • Taking the dialogue in Figure 1 as an example, to answer the drivers query about the “gas station”, the end-to-end dialogue system directly generates system response given the query and a corresponding knowledge base (KB).
  • Methods:

    3.1 Datasets

    Two publicly available datasets are used in this paper, which include SMD (Eric et al, 2017) and an extension of Multi-WOZ 2.1 (Budzianowski et al, 2018) that the authors equip the corresponding KB to every dialogue.1 The detailed statistics are presented in Table 1.
  • The dropout ratio the authors use in the framework is selected from {0.1, 0.2} and the batch size from {16, 32}.
  • The authors use Adam (Kingma and Ba, 2015) to optimize the parameters in the model and adopt the suggested hyper-parameters for optimization.
  • All hyper-parameters are selected according to validation set.
  • More details about hyper-parameters can be found in Appendix
  • Results:

    Follow the prior work (Eric et al, 2017; Madotto et al, 2018; Wen et al, 2018; Wu et al, 2019a; Qin et al, 2019b), the authors adopt the BLEU and Micro Entity F1 metrics to evaluate model performance.
  • The results on the two datasets are shown in Table 2, the authors can observe that: 1) The basic shared-private framework outperforms the best prior model GLMP in all the datasets.
  • This indicates that the combination of domain-shared and domain-specific features can better enhance each domain performanc compared with only utilizing the implicit domain-shared features.
  • The authors can see that the framework outperforms GLMP on all metrics, which is consistent with the automatic evaluation
  • Conclusion:

    The authors propose to use a shared-private model to investigate explicit modeling domain knowledge for multi-domain dialog.
  • A dynamic fusion layer is proposed to dynamically capture the correlation between a target domain and all source domains.
  • Experiments on two datasets show the effectiveness of the proposed models.
  • The authors thank Min Xu, Jiapeng Li, Jieru Lin and Zhouyang Li for their insightful discussions.
  • The authors thank all anonymous reviewers for their constructive comments.
  • This work faxed the support via Westlake-BrightDreams Robotics research grant
Tables
  • Table1: Statistics of datasets
  • Table2: Main results. The numbers with * indicate that the improvement of our framework over all baselines is statistically significant with p < 0.05 under t-test
  • Table3: Ablation tests on the SMD test set
  • Table4: Human evaluation of responses on the randomly selected dialogue history
  • Table5: Hyperparameters we use for SMD and MultiWOZ 2.1 dataset
Download tables as Excel
Related work
  • Existing end-to-end task-oriented systems can be classified into two main classes. A series of work trains a single model on the mixed multi-domain dataset. Eric et al (2017) augments the vocabulary distribution by concatenating KB attention to generatge entities. Lei et al (2018) first integrates track dialogue believes in end-to-end task-oriented dialog. Madotto et al (2018) combines end-toend memory network (Sukhbaatar et al, 2015) into sequence generation. Gangi Reddy et al (2019) proposes a multi-level memory architecture which first addresses queries, followed by results and finally each key-value pair within a result. Wu et al (2019a) proposes a global-to-locally pointer mechanism to query the knowledge base. Compared with their models, our framework can not only explicitly utilize domain-specific knowledge but also consider different relevance between each domain. Another series of work trains a model on each domain separately. Wen et al (2018) leverages dialogue state representation to retrieve the KB implicitly. Qin et al (2019b) first adopts the KB-retriever to explicitly query the knowledge base. Their works consider only domain-specific features. In contrast, our framework explicitly leverages domain-shared features across domains.
Funding
  • This work was supported by the National Natural Science Foundation of China (NSFC) via grant 61976072, 61632011 and 61772153
Reference
  • Paweł Budzianowski, Tsung-Hsien Wen, Bo-Hsiang Tseng, Inigo Casanueva, Stefan Ultes, Osman Ramadan, and Milica Gasic. 2018. MultiWOZ - a large-scale multi-domain wizard-of-Oz dataset for task-oriented dialogue modelling. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 5016–5026, Brussels, Belgium. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Mihail Eric, Lakshmi Krishnan, Francois Charette, and Christopher D. Manning. 2017. Key-value retrieval networks for task-oriented dialogue. In Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue, pages 37–49, Saarbrucken, Germany. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Mihail Eric and Christopher Manning. 2017. A copyaugmented sequence-to-sequence architecture gives good performance on task-oriented dialogue. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pages 468–473, Valencia, Spain. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Revanth Gangi Reddy, Danish Contractor, Dinesh Raghu, and Sachindra Joshi. 2019. Multi-level memory for task oriented dialogs. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3744–3754, Minneapolis, Minnesota. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Yaroslav Ganin and Victor Lempitsky. 2014. Unsupervised domain adaptation by backpropagation. arXiv preprint arXiv:1409.7495.
    Findings
  • Jiang Guo, Darsh Shah, and Regina Barzilay. 2018. Multi-source domain adaptation with mixture of experts. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4694–4703, Brussels, Belgium. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Sepp Hochreiter and Jurgen Schmidhuber. 199Long short-term memory. Neural Computation, 9(8):1735–1780.
    Google ScholarLocate open access versionFindings
  • Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings.
    Google ScholarLocate open access versionFindings
  • Wenqiang Lei, Xisen Jin, Min-Yen Kan, Zhaochun Ren, Xiangnan He, and Dawei Yin. 2018. Sequicity: Simplifying task-oriented dialogue systems with single sequence-to-sequence architectures. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1437–1447, Melbourne, Australia. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Bing Liu and Ian Lane. 2017. Multi-domain adversarial learning for slot filling in spoken language understanding. arXiv preprint arXiv:1711.11310.
    Findings
  • Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2017. Adversarial multi-task learning for text classification. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1–10, Vancouver, Canada. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Andrea Madotto, Chien-Sheng Wu, and Pascale Fung. 2018. Mem2Seq: Effectively incorporating knowledge bases into end-to-end task-oriented dialog systems. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1468–1478, Melbourne, Australia. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Libo Qin, Wanxiang Che, Yangming Li, Haoyang Wen, and Ting Liu. 2019a. A stack-propagation framework with token-level intent detection for spoken language understanding. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2078–2087, Hong Kong, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Libo Qin, Yijia Liu, Wanxiang Che, Haoyang Wen, Yangming Li, and Ting Liu. 2019b. Entityconsistent end-to-end task-oriented dialogue system with KB retriever. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLPIJCNLP), pages 133–142, Hong Kong, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston, and Rob Fergus. 20End-to-end memory networks. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 712, 2015, Montreal, Quebec, Canada, pages 2440– 2448.
    Google ScholarLocate open access versionFindings
  • Haoyang Wen, Yijia Liu, Wanxiang Che, Libo Qin, and Ting Liu. 2018. Sequence-to-sequence learning for task-oriented dialogue with dialogue state representation. In Proceedings of the 27th International Conference on Computational Linguistics, pages 3781– 3792, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Chien-Sheng Wu, Richard Socher, and Caiming Xiong. 2019a. Global-to-local memory pointer networks for task-oriented dialogue. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019.
    Google ScholarLocate open access versionFindings
  • Haiming Wu, Yue Zhang, Xi Jin, Yun Xue, and Ziwen Wang. 2019b. Shared-private LSTM for multidomain text classification. In Natural Language Processing and Chinese Computing - 8th CCF International Conference, NLPCC 2019, Dunhuang, China, October 9-14, 2019, Proceedings, Part II, pages 116–128.
    Google ScholarLocate open access versionFindings
  • Steve J. Young, Milica Gasic, Blaise Thomson, and Jason D. Williams. 2013. Pomdp-based statistical spoken dialog systems: A review. Proceedings of the IEEE, 101(5):1160–1179.
    Google ScholarLocate open access versionFindings
  • Ye Zhang, Nan Ding, and Radu Soricut. 2018. SHAPED: Shared-private encoder-decoder for text style adaptation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1528–1538, New Orleans, Louisiana. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Victor Zhong, Caiming Xiong, and Richard Socher. 2018. Global-locally self-attentive encoder for dialogue state tracking. In Proceedings of the 56th Annual Meeting of the Association for Computational
    Google ScholarLocate open access versionFindings
  • Linguistics (Volume 1: Long Papers), pages 1458– 1467, Melbourne, Australia. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments