Language Generation with Multi Hop Reasoning on Commonsense Knowledge Graph

empirical methods in natural language processing, pp. 725-736, 2020.

Cited by: 2|Bibtex|Views92|Links
Keywords:
Story Ending Generationreasoning flowrelational pathgraph convolutional networkexternal commonsense knowledgeMore(23+)
Weibo:
We present Generation with Multi-Hop Reasoning Flow that reasons over structured commonsense knowledge during text generation

Abstract:

Despite the success of generative pre-trained language models on a series of text generation tasks, they still suffer in cases where reasoning over underlying commonsense knowledge is required during generation. Existing approaches that integrate commonsense knowledge into generative pre-trained language models simply transfer relational ...More

Code:

Data:

0
Introduction
  • Despite the recent success of pre-trained language models such as GPT-2 (Radford et al, 2019) on various language generation tasks, these models are still struggling on generation tasks that require reasoning over commonsense knowledge that is not explicitly stated in the context.
  • Figure 1 illustrates an example in the story ending generation task, where external commonsense knowledge in the form of relational paths can guide the generation of the key concepts “substance” and ROC Story Story Context.
  • Mr Egg was presenting a volcanic eruption to the science class.
  • He has a diagram of a volcano that looked like it was made of tinfoil.
  • The volcano exploded with substance that looked like lava !
Highlights
  • Despite the recent success of pre-trained language models such as GPT-2 (Radford et al, 2019) on various language generation tasks, these models are still struggling on generation tasks that require reasoning over commonsense knowledge that is not explicitly stated in the context
  • We propose Generation with MultiHop Reasoning Flow (GRF), a generation model that performs multi-hop reasoning on the external knowledge graph for knowledge-enriched language generation
  • 2) We propose the dynamic multi-hop reasoning module that aggregates evidence along relational paths for grounded generation of some critical concepts
  • We focus on text generation tasks where reasoning over external commonsense knowledge is required
  • We show the results of our reasoning module with mean(·) aggregator and observe some performance drop comparing with max(·)
  • We present Generation with Multi-Hop Reasoning Flow that reasons over structured commonsense knowledge during text generation
Methods
  • The authors focus on text generation tasks where reasoning over external commonsense knowledge is required.
  • The input source is a text sequence x = (x1, x2, · · · , xN ) which may consist of several sentences.
  • Since direct reasoning on the complete graph is intractable, the authors extract a sub-graph G = (V, E) given the input text where V ⊂ V and E ⊂ E.
  • The sub-graph consists of inter-connected H-hop paths starting from the source concepts Cx extracted from the input text.
  • The task is formulated as generating the best hypothesis ywhich maximizes the following conditional probability:
Results
  • The human evaluation results are presented in Table 6 where the model significantly outperforms compared baselines in terms of both criteria on all the datasets.
Conclusion
  • The authors present Generation with Multi-Hop Reasoning Flow that reasons over structured commonsense knowledge during text generation.
  • The proposed method leverages both the structural and semantic information of the external knowledge base by performing dynamic multi-hop reasoning on the relational paths.
  • The authors conduct extensive experiments and empirically show that the method outperforms existing approaches that integrate commonsense knowledge to pre-trained language models on three text generation tasks.
  • The authors demonstrate the interpretability of the method with inferred reasoning paths that provide rationale to the generated results
Summary
  • Introduction:

    Despite the recent success of pre-trained language models such as GPT-2 (Radford et al, 2019) on various language generation tasks, these models are still struggling on generation tasks that require reasoning over commonsense knowledge that is not explicitly stated in the context.
  • Figure 1 illustrates an example in the story ending generation task, where external commonsense knowledge in the form of relational paths can guide the generation of the key concepts “substance” and ROC Story Story Context.
  • Mr Egg was presenting a volcanic eruption to the science class.
  • He has a diagram of a volcano that looked like it was made of tinfoil.
  • The volcano exploded with substance that looked like lava !
  • Methods:

    The authors focus on text generation tasks where reasoning over external commonsense knowledge is required.
  • The input source is a text sequence x = (x1, x2, · · · , xN ) which may consist of several sentences.
  • Since direct reasoning on the complete graph is intractable, the authors extract a sub-graph G = (V, E) given the input text where V ⊂ V and E ⊂ E.
  • The sub-graph consists of inter-connected H-hop paths starting from the source concepts Cx extracted from the input text.
  • The task is formulated as generating the best hypothesis ywhich maximizes the following conditional probability:
  • Results:

    The human evaluation results are presented in Table 6 where the model significantly outperforms compared baselines in terms of both criteria on all the datasets.
  • Conclusion:

    The authors present Generation with Multi-Hop Reasoning Flow that reasons over structured commonsense knowledge during text generation.
  • The proposed method leverages both the structural and semantic information of the external knowledge base by performing dynamic multi-hop reasoning on the relational paths.
  • The authors conduct extensive experiments and empirically show that the method outperforms existing approaches that integrate commonsense knowledge to pre-trained language models on three text generation tasks.
  • The authors demonstrate the interpretability of the method with inferred reasoning paths that provide rationale to the generated results
Tables
  • Table1: Statistics of the datasets used in this paper. *:Examples with multiple references are counted separately
  • Table2: Statistics of the extracted subgraphs on the training sets of three datasets, including the average number of concepts and triples for each subgraph
  • Table3: Automatic evaluation results on the test set of EG and αNLG. Entries with N/A mean the baseline is not designated for this task. †: we use the generation results from <a class="ref-link" id="cBhagavatula_et+al_2020_a" href="#rBhagavatula_et+al_2020_a">Bhagavatula et al (2020</a>)
  • Table4: Automatic evaluation on the test set of SEG
  • Table5: Ablation study on the test set of αNLG. SMGE denotes static multi-relational graph encoding (see §3.2.1) and DMRF denotes dynamic multi-hop reasoning flow (see §3.2.3)
  • Table6: Human evaluation results on three datasets. Scores indicate the percentage of Win (W) and Lose (L) when comparing our model with a baseline in terms of fluency and reasonability. Scores marked with * mean p-value < 0.05 and ** indicates p-value < 0.01 in sign test. Entries with N/A mean the baseline is not designated for this task
  • Table7: Annotator agreement. Scores denotes Fleiss’ kappa (<a class="ref-link" id="cFleiss_1971_a" href="#rFleiss_1971_a">Fleiss, 1971</a>) which evaluates the agreement from multiple annotators in terms of fluency and reasonability
  • Table8: Case study on the test set of three datasets. Words in blue denote source concepts in the input contexts while words in orange are the associated concepts generated by the GRF
Download tables as Excel
Related work
  • 2.1 Commonsense-Aware Neural Text Generation

    Incorporating commonsense knowledge is essential for text generation to augment the limited textual information. In dialogue generation, Zhou et al (2018) enriched the context representations of the post with neighbouring concepts on ConceptNet using graph attention. In story ending generation, Guan et al (2019) proposed incremental encoding with multi-source attention to incorporate one-hop knowledge graph for concepts in the story context. In topic-to-essay generation, Yang et al (2019) augmented the generator with a concept memory that updated dynamically with gate mechanism. Recently, some work also attempted to integrate external commonsense knowledge into generative pretrained language models such as GPT-2 (Radford et al, 2019). Guan et al (2020) conducted posttraining on sythetic data constructed from commonsense knowledge bases by translating triplets into natural language texts using templates. Bhagavatula et al (2020) transferred embeddings of COMeT (Bosselut et al, 2019), a GPT-2 model fine-tuned to generate the tail entity of a triple in commonsense knowledge graph, into another GPT-2 model for text generation. In comparison, our model utilizes both structural and semantic information of the commonsense knowledge graph during generation and does not suffers from the catastrophic forgetting problem (Kirkpatrick et al, 2016) caused by implicit knowledge transferring.
Funding
  • This work was jointly supported by the NSFC projects (key project with No 61936010 and regular project with No 61876096), and the Guoqiang Institute of Tsinghua University with Grant No 2019GQG1
Study subjects and analysis
datasets: 3
4.10 Case Study. We provide some test cases on three datasets in Table 8 and observe that: I. Baseline models tend to generate general cases while the GRF is able to generate more specific concepts by exploring the plau-

Reference
  • Satanjeev Banerjee and Alon Lavie. 2005. METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization@ACL 2005, Ann Arbor, Michigan, USA, June 29, 2005, pages 65–72. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Lisa Bauer, Yicheng Wang, and Mohit Bansal. 2018. Commonsense for generative multi-hop question answering tasks. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018, pages 4220–4230. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Chandra Bhagavatula, Ronan Le Bras, Chaitanya Malaviya, Keisuke Sakaguchi, Ari Holtzman, Hannah Rashkin, Doug Downey, Wen-tau Yih, and Yejin Choi. 2020. Abductive commonsense reasoning. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net.
    Google ScholarFindings
  • Antoine Bordes, Nicolas Usunier, Alberto GarcıaDuran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multirelational data. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States, pages 2787– 2795.
    Google ScholarLocate open access versionFindings
  • Antoine Bosselut, Hannah Rashkin, Maarten Sap, Chaitanya Malaviya, Asli Celikyilmaz, and Yejin Choi. 2019. COMET: commonsense transformers for automatic knowledge graph construction. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, pages 4762–4779. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Nicola De Cao, Wilker Aziz, and Ivan Titov. 2019. Question answering by reasoning across documents with graph convolutional networks. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pages 2306–2317. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Wenhu Chen, Wenhan Xiong, Xifeng Yan, and William Yang Wang. 2018. Variational knowledge graph reasoning. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 1 (Long Papers), pages 1823–1832. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Rajarshi Das, Shehzaad Dhuliawala, Manzil Zaheer, Luke Vilnis, Ishan Durugkar, Akshay Krishnamurthy, Alex Smola, and Andrew McCallum. 2018.
    Google ScholarFindings
  • Go for a walk and arrive at the answer: Reasoning over paths in knowledge bases using reinforcement learning. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net.
    Google ScholarLocate open access versionFindings
  • Joseph L. Fleiss. 1971. Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5):378–382.
    Google ScholarLocate open access versionFindings
  • Jiatao Gu, Zhengdong Lu, Hang Li, and Victor O. K. Li. 2016. Incorporating copying mechanism in sequence-to-sequence learning. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7-12, 2016, Berlin, Germany, Volume 1: Long Papers. The Association for Computer Linguistics.
    Google ScholarLocate open access versionFindings
  • Jian Guan, Fei Huang, Minlie Huang, Zhihao Zhao, and Xiaoyan Zhu. 2020. A knowledge-enhanced pretraining model for commonsense story generation. Trans. Assoc. Comput. Linguistics, 8:93–108.
    Google ScholarLocate open access versionFindings
  • Jian Guan, Yansen Wang, and Minlie Huang. 2019. Story ending generation with incremental encoding and commonsense knowledge. In The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019, pages 6473–6480. AAAI Press.
    Google ScholarLocate open access versionFindings
  • Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings.
    Google ScholarLocate open access versionFindings
  • Thomas N. Kipf and Max Welling. 2017. Semisupervised classification with graph convolutional networks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net.
    Google ScholarLocate open access versionFindings
  • James N Kirkpatrick, Razvan Pascanu, Neil C. Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, and Raia Hadsell. 20Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences of the United States of America, 114 13:3521–3526.
    Google ScholarLocate open access versionFindings
  • Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and Bill Dolan. 2016. A diversity-promoting objective function for neural conversation models. In NAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego California, USA, June 12-17, 2016, pages 110–119. The Association for Computational Linguistics.
    Google ScholarFindings
  • Bill Yuchen Lin, Xinyue Chen, Jamin Chen, and Xiang Ren. 2019. KagNet: Knowledge-aware graph networks for commonsense reasoning. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2829–2839, Hong Kong, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
    Google ScholarFindings
  • Xi Victoria Lin, Richard Socher, and Caiming Xiong. 2018. Multi-hop knowledge graph reasoning with reward shaping. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018, pages 3243–3253. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Zhibin Liu, Zheng-Yu Niu, Hua Wu, and Haifeng Wang. 2019. Knowledge aware conversation generation with explainable reasoning over augmented graphs. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, pages 1782–1792. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Shangwen Lv, Daya Guo, Jingjing Xu, Duyu Tang, Nan Duan, Ming Gong, Linjun Shou, Daxin Jiang, Guihong Cao, and Songlin Hu. 2019. Graphbased reasoning over heterogeneous external knowledge for commonsense question answering. CoRR, abs/1909.05311.
    Findings
  • Diego Marcheggiani and Ivan Titov. 2017. Encoding sentences with graph convolutional networks for semantic role labeling. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017, pages 1506–1515. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Seungwhan Moon, Pararth Shah, Anuj Kumar, and Rajen Subba. 2019. Opendialkg: Explainable conversational reasoning with attention-based walks over knowledge graphs. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, pages 845–854. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Nasrin Mostafazadeh, Nathanael Chambers, Xiaodong He, Devi Parikh, Dhruv Batra, Lucy Vanderwende, Pushmeet Kohli, and James F. Allen. 2016. A corpus and cloze evaluation for deeper understanding of commonsense stories. In NAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego California, USA, June 12-17, 2016, pages 839–849. The Association for Computational Linguistics.
    Google ScholarFindings
  • Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, July 6-12, 2002, Philadelphia, PA, USA, pages 311–318. ACL.
    Google ScholarLocate open access versionFindings
  • Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in pytorch.
    Google ScholarFindings
  • Lin Qiu, Yunxuan Xiao, Yanru Qu, Hao Zhou, Lei Li, Weinan Zhang, and Yong Yu. 2019. Dynamically fused graph network for multi-hop reasoning. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, pages 6140–6150. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners.
    Google ScholarFindings
  • Maarten Sap, Ronan Le Bras, Emily Allaway, Chandra Bhagavatula, Nicholas Lourie, Hannah Rashkin, Brendan Roof, Noah A. Smith, and Yejin Choi. 2019. ATOMIC: an atlas of machine commonsense for if-then reasoning. In The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019, pages 3027–3035. AAAI Press.
    Google ScholarLocate open access versionFindings
  • Michael Sejr Schlichtkrull, Thomas N. Kipf, Peter Bloem, Rianne van den Berg, Ivan Titov, and Max Welling. 2018. Modeling relational data with graph convolutional networks. In The Semantic Web - 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, June 3-7, 2018, Proceedings, volume 10843 of Lecture Notes in Computer Science, pages 593–607. Springer.
    Google ScholarLocate open access versionFindings
  • Abigail See, Peter J. Liu, and Christopher D. Manning. 2017. Get to the point: Summarization with pointergenerator networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30 August 4, Volume 1: Long Papers, pages 1073–1083. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • R. Speer and Catherine Havasi. 2012. Representing general relational knowledge in conceptnet 5. In Proceedings of the Eighth International Conference on Language Resources and Evaluation, LREC
    Google ScholarLocate open access versionFindings
  • 2012, Istanbul, Turkey, May 23-25, 2012, pages 3679–3686. European Language Resources Association (ELRA).
    Google ScholarLocate open access versionFindings
  • Trieu H. Trinh and Quoc V. Le. 2018. A simple method for commonsense reasoning. CoRR, abs/1806.02847.
    Findings
  • Yi-Lin Tuan, Yun-Nung Chen, and Hung-yi Lee. 2019. Dykgchat: Benchmarking dialogue generation grounding on dynamic knowledge graphs. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, pages 1855– 1865. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Shikhar Vashishth, Soumya Sanyal, Vikram Nitin, and Partha P. Talukdar. 2020. Composition-based multirelational graph convolutional networks. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 2630, 2020. OpenReview.net.
    Google ScholarFindings
  • Ramakrishna Vedantam, C. Lawrence Zitnick, and Devi Parikh. 2015. Cider: Consensus-based image description evaluation. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015, pages 4566– 4575. IEEE Computer Society.
    Google ScholarLocate open access versionFindings
  • Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2018. Graph attention networks. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net.
    Google ScholarLocate open access versionFindings
  • Cunxiang Wang, Shuailong Liang, Yue Zhang, Xiaonan Li, and Tian Gao. 2019. Does it make sense? and why? A pilot study for sense making and explanation. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, pages 4020–4026. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Pengcheng Yang, Lei Li, Fuli Luo, Tianyu Liu, and Xu Sun. 2019. Enhancing topic-to-essay generation with external commonsense knowledge. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2002–2012, Florence, Italy. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Zhi-Xiu Ye, Qian Chen, Wen Wang, and Zhen-Hua Ling. 2019.
    Google ScholarFindings
  • Hao Zhou, Tom Young, Minlie Huang, Haizhou Zhao, Jingfang Xu, and Xiaoyan Zhu. 2018. Commonsense knowledge aware conversation generation with graph attention. In Proceedings of the TwentySeventh International Joint Conference on Artificial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Sweden, pages 4623–4629. ijcai.org.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments