Co-GAT: A Co-Interactive Graph Attention Network for Joint Dialog Act Recognition and Sentiment Classification
Keywords:
graph attention networkmutual interaction informationact recognitioncontextual informationbidirectional LSTMMore(6+)
Weibo:
Abstract:
In a dialog system, dialog act recognition and sentiment classification are two correlative tasks to capture speakers intentions, where dialog act and sentiment can indicate the explicit and the implicit intentions separately. The dialog context information (contextual information) and the mutual interaction information are two key fact...More
Introduction
- Dialog act recognition (DAR) and sentiment classification (SC) are two correlative tasks to correctly understand speakers’ utterances in a dialog system (Cerisara et al 2018; Lin, Xu, and Zhang 2020; Qin et al 2020a).
- SC can detect the sentiments in utterances which can help to capture speakers’ implicit intentions.
- There are two key factors that contribute to the dialog act recognition and sentiment prediction.
- One is the mutual interaction information across two tasks and the other is the contextual information across utterances in a dialogue
Highlights
- Dialog act recognition (DAR) and sentiment classification (SC) are two correlative tasks to correctly understand speakers’ utterances in a dialog system (Cerisara et al 2018; Lin, Xu, and Zhang 2020; Qin et al 2020a)
- Following Kim and Kim (2018); Cerisara et al (2018); Qin et al (2020a), we adopt macro-average Precision, Recall and F1 for both sentiment classification and dialog act recognition on Dailydialog dataset and we adopt the average of the dialogact specific F1 scores weighted by the prevalence of each dialog act on Mastodon dataset
- The first block of table represents the separate model for dialog act recognition task while the second block denotes the separate model for sentiment classification task
- Our framework outperforms the state-of-the-art dialog act recognition and sentiment classification models which trained in separate task in all metrics on two datasets
- We find that the integration of Co-Interactive Graph Attention Network (Co-graph attention network (GAT)) and Roberta/XLNet can further improve the performance, demonstrating that contributions from the two are complementary
- On Mastodon dataset, our model gains 3.0% and 1.9% improvement on F1 score on SC and DAR task, respectively
- We propose a co-interactive graph framework where a cross-utterances connection and a cross-tasks connection are constructed and iteratively updated with each other, achieving to simultaneously model the contextual information and mutual interaction information in a unified architecture
Methods
- Datasets
The authors conduct experiments on the benchmark Dailydialog (Li et al 2017) and Mastodon (Cerisara et al 2018). - The dataset contains 11,118 dialogues Model.
- HEC (Kumar et al 2018) CRF-ASN (Chen et al 2018) CASA (Raheja and Tetreault 2019) DialogueRNN (Majumder et al 2019) DialogueGCN (Ghosal et al 2019) JointDAS (Cerisara et al 2018) IIIM (Kim and Kim 2018) DCR-Net + Co-Attention (Qin et al 2020a) The authors' model F1 (%) -.
Results
- Following Kim and Kim (2018); Cerisara et al (2018); Qin et al (2020a), the authors adopt macro-average Precision, Recall and F1 for both sentiment classification and dialog act recognition on Dailydialog dataset and the authors adopt the average of the dialogact specific F1 scores weighted by the prevalence of each dialog act on Mastodon dataset.
The experimental results are shown in Table 3. - The authors' framework outperforms the state-of-the-art dialog act recognition and sentiment classification models which trained in separate task in all metrics on two datasets.
- This shows that the proposed graph interaction model has incorporated the mutual interaction information between the two tasks which can be effectively utilized for promoting performance mutually.
- The authors find that the integration of Co-GAT and Roberta/XLNet can further improve the performance, demonstrating that contributions from the two are complementary
Conclusion
- The authors propose a co-interactive graph framework where a cross-utterances connection and a cross-tasks connection are constructed and iteratively updated with each other, achieving to simultaneously model the contextual information and mutual interaction information in a unified architecture.
- Experiments on two datasets show the effectiveness of the proposed models and the model achieves state-of-the-art performance.
- The authors analyze the effect of incorporating strong pre-trained model in the joint model and find that the framework is beneficial when combined with pre-trained models (BERT, Roberta, XLNet)
Summary
Introduction:
Dialog act recognition (DAR) and sentiment classification (SC) are two correlative tasks to correctly understand speakers’ utterances in a dialog system (Cerisara et al 2018; Lin, Xu, and Zhang 2020; Qin et al 2020a).- SC can detect the sentiments in utterances which can help to capture speakers’ implicit intentions.
- There are two key factors that contribute to the dialog act recognition and sentiment prediction.
- One is the mutual interaction information across two tasks and the other is the contextual information across utterances in a dialogue
Objectives:
Edges: Since the authors aim to model the speaker information in a dialog explicitly, vertex i and vertex j should be connected if they belong to the same speaker.Methods:
Datasets
The authors conduct experiments on the benchmark Dailydialog (Li et al 2017) and Mastodon (Cerisara et al 2018).- The dataset contains 11,118 dialogues Model.
- HEC (Kumar et al 2018) CRF-ASN (Chen et al 2018) CASA (Raheja and Tetreault 2019) DialogueRNN (Majumder et al 2019) DialogueGCN (Ghosal et al 2019) JointDAS (Cerisara et al 2018) IIIM (Kim and Kim 2018) DCR-Net + Co-Attention (Qin et al 2020a) The authors' model F1 (%) -.
Results:
Following Kim and Kim (2018); Cerisara et al (2018); Qin et al (2020a), the authors adopt macro-average Precision, Recall and F1 for both sentiment classification and dialog act recognition on Dailydialog dataset and the authors adopt the average of the dialogact specific F1 scores weighted by the prevalence of each dialog act on Mastodon dataset.
The experimental results are shown in Table 3.- The authors' framework outperforms the state-of-the-art dialog act recognition and sentiment classification models which trained in separate task in all metrics on two datasets.
- This shows that the proposed graph interaction model has incorporated the mutual interaction information between the two tasks which can be effectively utilized for promoting performance mutually.
- The authors find that the integration of Co-GAT and Roberta/XLNet can further improve the performance, demonstrating that contributions from the two are complementary
Conclusion:
The authors propose a co-interactive graph framework where a cross-utterances connection and a cross-tasks connection are constructed and iteratively updated with each other, achieving to simultaneously model the contextual information and mutual interaction information in a unified architecture.- Experiments on two datasets show the effectiveness of the proposed models and the model achieves state-of-the-art performance.
- The authors analyze the effect of incorporating strong pre-trained model in the joint model and find that the framework is beneficial when combined with pre-trained models (BERT, Roberta, XLNet)
Tables
- Table1: Comparison of our model with baselines on Mastodon and Dailydialog datasets. SC represents Sentiment Classification and DAR represents Dialog Act Recognition. The numbers with * indicate that the improvement of our model over all baselines is statistically significant with p < 0.05 under t-test
- Table2: Ablation study on Mastodon and Dailydialog test datasets
- Table3: Results on the pre-trained models
Related work
- Dialog Act Recognition
Kalchbrenner and Blunsom (2013) propose a hierarchical CNN to model the context information for DAR. Lee and Dernoncourt (2016) propose a model which combine the advantages of CNNs and RNNs and incorporated the previous utterance as context to classify the current for DAR. Ji, Haffari, and Eisenstein (2016) use a hybrid architecture, combining an RNN language model with a latent variable model. Furthermore, many work (Liu et al 2017; Kumar et al 2018; Chen et al 2018) explore different architectures to better incorporate the context information for DAR. Raheja and Tetreault (2019) propose the context-aware self-attention mechanism for DAR and achieve the promising performance.
Sentiment Classification
Funding
- This work was supported by the National Key R&D Program of China via grant 2020AAA0106501 and the National Natural Science Foundation of China (NSFC) via grant 61976072 and 61772153
Reference
- Cerisara, C.; Jafaritazehjani, S.; Oluokun, A.; and Le, H. T. 2018. Multi-task dialog act and sentiment recognition on Mastodon. In Proceedings of the 27th International Conference on Computational Linguistics, 745–754. Santa Fe, New Mexico, USA: Association for Computational Linguistics. URL https://www.aclweb.org/anthology/C18-1063.
- Chai, Z.; and Wan, X. 2020. Learning to Ask More: SemiAutoregressive Sequential Question Generation under DualGraph Interaction. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 225–237. Online: Association for Computational Linguistics. doi:10. 18653/v1/2020.acl-main.21. URL https://www.aclweb.org/anthology/2020.acl-main.21.
- Chen, Z.; Yang, R.; Zhao, Z.; Cai, D.; and He, X. 2018. Dialogue act recognition via crf-attentive structured network. In Proc. of SIGIR.
- Conneau, A.; Schwenk, H.; Barrault, L.; and Lecun, Y. 2017. Very Deep Convolutional Networks for Text Classification. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, 1107–1116.
- Valencia, Spain: Association for Computational Linguistics. URL https://www.aclweb.org/anthology/E17-1104.
- Devlin, J.; Chang, M.-W.; Lee, K.; and Toutanova, K. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proc. of NAACL.
- Ghosal, D.; Majumder, N.; Poria, S.; Chhaya, N.; and Gelbukh, A. 2019. DialogueGCN: A Graph Convolutional Neural Network for Emotion Recognition in Conversation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLPIJCNLP), 154–164. Hong Kong, China: Association for Computational Linguistics. doi:10.18653/v1/D19-1015. URL https://www.aclweb.org/anthology/D19-1015.
- Hochreiter, S.; and Schmidhuber, J. 1997. Long short-term memory. Neural computation.
- Ji, Y.; Haffari, G.; and Eisenstein, J. 2016. A latent variable recurrent neural network for discourse relation language models. arXiv preprint arXiv:1603.01913.
- Johnson, R.; and Zhang, T. 2017. Deep Pyramid Convolutional Neural Networks for Text Categorization. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 562–570.
- Vancouver, Canada: Association for Computational Linguistics. doi:10.18653/v1/P17-1052. URL https://www.aclweb.org/anthology/P17-1052.
- Kalchbrenner, N.; and Blunsom, P. 2013. Recurrent convolutional neural networks for discourse compositionality. arXiv preprint arXiv:1306.3584.
- Kim, M.; and Kim, H. 2018. Integrated neural network model for identifying speech acts, predicators, and sentiments of dialogue utterances. Pattern Recognition Letters.
- Kingma, D. P.; and Ba, J. 20Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
- Kumar, H.; Agarwal, A.; Dasgupta, R.; and Joshi, S. 2018. Dialogue act sequence labeling using hierarchical encoder with crf. In Proc. of AAAI.
- Lee, J. Y.; and Dernoncourt, F. 20Sequential short-text classification with recurrent and convolutional neural networks. arXiv preprint arXiv:1603.03827.
- Li, Y.; Su, H.; Shen, X.; Li, W.; Cao, Z.; and Niu, S. 20DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 986–995.
- Taipei, Taiwan: Asian Federation of Natural Language Processing. URL https://www.aclweb.org/anthology/I17-1099.
- Lin, T.-E.; Xu, H.; and Zhang, H. 2020. Discovering New Intents via Constrained Deep Adaptive Clustering with Cluster Refinement. In Thirty-Fourth AAAI Conference on Artificial Intelligence.
- Liu, Y.; Han, K.; Tan, Z.; and Lei, Y. 2017. Using Context Information for Dialog Act Classification in DNN Framework. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2170–2178.
- Copenhagen, Denmark: Association for Computational Linguistics. doi:10.18653/v1/D17-1231. URL https://www.aclweb.org/anthology/D17-1231.
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; and Stoyanov, V. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
- Lu, Y.-J.; and Li, C.-T. 2020. GCAN: Graph-aware CoAttention Networks for Explainable Fake News Detection on Social Media. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 505–514. Online: Association for Computational Linguistics. doi:10. 18653/v1/2020.acl-main.48. URL https://www.aclweb.org/anthology/2020.acl-main.48.
- Majumder, N.; Poria, S.; Hazarika, D.; Mihalcea, R.; Gelbukh, A.; and Cambria, E. 2019. Dialoguernn: An attentive rnn for emotion detection in conversations. In Proceedings of the AAAI Conference on Artificial Intelligence.
- Qin, L.; Che, W.; Li, Y.; Ni, M.; and Liu, T. 2020a. DCR-Net: A Deep Co-Interactive Relation Network for Joint Dialog Act Recognition and Sentiment Classification. In Proceedings of the AAAI Conference on Artificial Intelligence.
- Qin, L.; Che, W.; Li, Y.; Wen, H.; and Liu, T. 2019. A StackPropagation Framework with Token-Level Intent Detection for Spoken Language Understanding. In Proc. of EMNLP.
- Qin, L.; Ni, M.; Zhang, Y.; Che, W.; Li, Y.; and Liu, T. 2020b. Multi-Domain Spoken Language Understanding Using Domain-and Task-Aware Parameterization. arXiv preprint arXiv:2004.14871.
- Qin, L.; Xu, X.; Che, W.; and Liu, T. 2020c. AGIF: An Adaptive Graph-Interactive Framework for Joint Multiple Intent Detection and Slot Filling. In Findings of the Association for Computational Linguistics: EMNLP 2020, 1807–1816. Online: Association for Computational Linguistics. doi:10.18653/v1/2020.findings-emnlp.163. URL https://www.aclweb.org/anthology/2020.findings-emnlp.163.
- Raheja, V.; and Tetreault, J. 2019. Dialogue Act Classification with Context-Aware Self-Attention. In Proc. of NAACL.
- Scarselli, F.; Gori, M.; Tsoi, A. C.; Hagenbuchner, M.; and Monfardini, G. 2009. The graph neural network model. IEEE Transactions on Neural Networks 20(1): 61–80.
- Shi, Y.; Yao, K.; Tian, L.; and Jiang, D. 2016. Deep LSTM based Feature Mapping for Query Classification. In Proc. of NAACL.
- Tang, D.; Qin, B.; and Liu, T. 2015. Document Modeling with Gated Recurrent Neural Network for Sentiment Classification. In Proc. of ACL.
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, L. u.; and Polosukhin, I. 2017. Attention is All you Need. In Proc. of NIPS. Curran Associates, Inc.
- Velickovic, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; and Bengio, Y. 2017. Graph attention networks. arXiv preprint arXiv:1710.10903.
- Wang, B. 2018. Disconnected Recurrent Neural Networks for Text Categorization. In Proc. of ACL.
- Xiao, Y.; and Cho, K. 2016. Efficient character-level document classification by combining convolution and recurrent layers. arXiv preprint arXiv:1602.00367.
- Xu, J.; Chen, D.; Qiu, X.; and Huang, X. 2016. Cached Long Short-Term Memory Neural Networks for Document-Level Sentiment Classification. In Proc. of EMNLP.
- Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.; Salakhutdinov, R. R.; and Le, Q. V. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. In Advances in neural information processing systems, 5753–5763.
- Yang, Z.; Yang, D.; Dyer, C.; He, X.; Smola, A.; and Hovy, E. 2016. Hierarchical Attention Networks for Document Classification. In Proc. of NAACL.
- Zhang, X.; Zhao, J.; and LeCun, Y. 2015. Character-level convolutional networks for text classification. In Proc. of NIPS, 649–657.
Tags
Comments