Cognitive Graph for Multi-Hop Reading Comprehension at Scale

Meeting of the Association for Computational Linguistics, 2019.

Cited by: 52|Bibtex|Views685|Links
EI
Keywords:
machine reading comprehensiondeep learningmulti hop reading comprehensiongraph neural networkcognitive graphMore(10+)
Weibo:
Our implementation based on BERT and graph neural network obtains state-of-art results on HotpotQA dataset, which shows the efficacy of our framework

Abstract:

We propose a new CogQA framework for multi-hop question answering in web-scale documents. Inspired by the dual process theory in cognitive science, the framework gradually builds a \textit{cognitive graph} in an iterative process by coordinating an implicit extraction module (System 1) and an explicit reasoning module (System 2). While ...More

Code:

Data:

Introduction
  • Deep learning models have made significant strides in machine reading comprehension and even outperformed human on single paragraph question answering (QA) benchmarks including SQuAD (Wang et al, 2018b; Devlin et al, 2018; Rajpurkar et al, 2016).
  • As revealed by adversarial tests (Jia and Liang, 2017), models for single paragraph QA tend to seek answers in sentences matched by the question, which does not involve complex reasoning.
  • HotpotQA (Yang et al, 2018) requires models to provide supporting sentences, which
Highlights
  • Deep learning models have made significant strides in machine reading comprehension and even outperformed human on single paragraph question answering (QA) benchmarks including SQuAD (Wang et al, 2018b; Devlin et al, 2018; Rajpurkar et al, 2016)
  • Following Yang et al (2018), the evaluation of answer and supporting facts consists of two metrics: Exact Match (EM) and F1 score
  • We present a new framework Cognitive Graph QA to tackle multi-hop machine reading problem at scale
  • The reasoning process is organized as cognitive graph, reaching unprecedented entity-level explainability
  • Our implementation based on BERT and graph neural network obtains state-of-art results on HotpotQA dataset, which shows the efficacy of our framework
  • We expect that prospective architectures combining attention and recurrent mechanisms will largely improve the capacity of System 1 by optimizing the interaction between systems
Results
  • Following Yang et al (2018), the evaluation of answer and supporting facts consists of two metrics: Exact Match (EM) and F1 score.
  • Joint precision and recall are the products of those of Ans and Sup, and joint F1 is calculated.
  • All results of these metrics are averaged over the test set.4.
Conclusion
  • The authors present a new framework CogQA to tackle multi-hop machine reading problem at scale.
  • The reasoning process is organized as cognitive graph, reaching unprecedented entity-level explainability.
  • Benefiting from the explicit structure in the cognitive graph, System 2 in CogQA has potential to leverage neural logic techniques to improve reliability.
  • The authors believe that the framework can generalize to other cognitive tasks, such as conversational AI and sequential recommendation
Summary
  • Introduction:

    Deep learning models have made significant strides in machine reading comprehension and even outperformed human on single paragraph question answering (QA) benchmarks including SQuAD (Wang et al, 2018b; Devlin et al, 2018; Rajpurkar et al, 2016).
  • As revealed by adversarial tests (Jia and Liang, 2017), models for single paragraph QA tend to seek answers in sentences matched by the question, which does not involve complex reasoning.
  • HotpotQA (Yang et al, 2018) requires models to provide supporting sentences, which
  • Results:

    Following Yang et al (2018), the evaluation of answer and supporting facts consists of two metrics: Exact Match (EM) and F1 score.
  • Joint precision and recall are the products of those of Ans and Sup, and joint F1 is calculated.
  • All results of these metrics are averaged over the test set.4.
  • Conclusion:

    The authors present a new framework CogQA to tackle multi-hop machine reading problem at scale.
  • The reasoning process is organized as cognitive graph, reaching unprecedented entity-level explainability.
  • Benefiting from the explicit structure in the cognitive graph, System 2 in CogQA has potential to leverage neural logic techniques to improve reliability.
  • The authors believe that the framework can generalize to other cognitive tasks, such as conversational AI and sequential recommendation
Tables
  • Table1: Results on HotpotQA (fullwiki setting). The test set is not public. The maintainer of HotpotQA only offers EM and F1 for every submission. N/A means the model cannot find supporting facts
Related work
  • Machine Reading Comprehension The research focus of machine reading comprehension (MRC) has been gradually transferred from cloze-style tasks (Hermann et al, 2015; Hill et al, 2015) to more complex QA tasks (Rajpurkar et al, 2016) recent years. Compared to the traditional computational linguistic pipeline (Hermann et al, 2015), neural network models, for example BiDAF (Seo et al, 2017a) and R-net (Wang et al, 2017), exhibit outstanding capacity for answer extraction in text. Pre-trained on large corpra, recent BERTbased models nearly settle down the single paragraph MRC-QA problem with performances beyond human-level, driving researchers to pay more attention to multi-hop reasoning.

    Multi-Hop QA Pioneering datasets of multi-hop QA are either based on limited knowledge base schemas (Talmor and Berant, 2018), or under multiple choices setting (Welbl et al, 2018). The noise in these datasets also restricted the development of multi-hop QA until high-quality HotpotQA (Yang et al, 2018) is released recently. The idea of “multi-step reasoning” also breeds multi-turn methods in single paragraph QA (Kumar et al, 2016; Seo et al, 2017b; Shen et al, 2017), assuming that models can capture information at deeper level implicitly by reading the text again.
Funding
  • The work is supported by Development Program of China (2016QY01W0200), NSFC for Distinguished Young Scholar (61825602), NSFC (61836013), and a research fund supported by Alibaba
Reference
  • Alan Baddeley. 1992. Working memory. Science, 255(5044):556–559.
    Google ScholarLocate open access versionFindings
  • Peter W Battaglia, Jessica B Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, et al. 2018. Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261.
    Findings
  • Nicholas J. Belkin. 199Interaction with texts: Information retrieval as information-seeking behavior. In Information Retrieval.
    Google ScholarLocate open access versionFindings
  • Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes. 2017. Reading wikipedia to answer opendomain questions. In Proceedings of the 55th Annual Meeting of the Association for Computational
    Google ScholarLocate open access versionFindings
  • Linguistics (Volume 1: Long Papers), volume 1, pages 1870–1879.
    Google ScholarLocate open access versionFindings
  • Christopher Clark and Matt Gardner. 2018. Simple and effective multi-paragraph reading comprehension. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, pages 845–855.
    Google ScholarLocate open access versionFindings
  • Michael Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems, pages 3844–3852.
    Google ScholarLocate open access versionFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 201BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
    Findings
  • Jonathan St BT Evans. 1984. Heuristic and analytic processes in reasoning. British Journal of Psychology, 75(4):451–468.
    Google ScholarLocate open access versionFindings
  • Jonathan St BT Evans. 2003. In two minds: dualprocess accounts of reasoning. Trends in cognitive sciences, 7(10):454–459.
    Google ScholarLocate open access versionFindings
  • Jonathan St BT Evans. 2008. Dual-processing accounts of reasoning, judgment, and social cognition. Annu. Rev. Psychol., 59:255–278.
    Google ScholarLocate open access versionFindings
  • Dan Hendrycks and Kevin Gimpel. 2016. Bridging nonlinearities and stochastic regularizers with gaussian error linear units. arXiv preprint arXiv:1606.08415.
    Findings
  • Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. 2015. Teaching machines to read and comprehend. In Advances in Neural Information Processing Systems, pages 1693– 1701.
    Google ScholarLocate open access versionFindings
  • Felix Hill, Antoine Bordes, Sumit Chopra, and Jason Weston. 2015. The goldilocks principle: Reading children’s books with explicit memory representations. arXiv preprint arXiv:1511.02301.
    Findings
  • Minghao Hu, Yuxing Peng, Zhen Huang, Xipeng Qiu, Furu Wei, and Ming Zhou. 2018. Reinforced mnemonic reader for machine reading comprehension. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, pages 4099– 4106. AAAI Press.
    Google ScholarLocate open access versionFindings
  • Robin Jia and Percy Liang. 2017. Adversarial examples for evaluating reading comprehension systems. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2021–2031.
    Google ScholarLocate open access versionFindings
  • Daniel Kahneman and Patrick Egan. 2011.
    Google ScholarFindings
  • Thomas N Kipf and Max Welling. 2017. Semisupervised classification with graph convolutional networks. In International Conference on Learning Representations.
    Google ScholarLocate open access versionFindings
  • Ankit Kumar, Ozan Irsoy, Peter Ondruska, Mohit Iyyer, James Bradbury, Ishaan Gulrajani, Victor Zhong, Romain Paulus, and Richard Socher. 2016. Ask me anything: Dynamic memory networks for natural language processing. In International Conference on Machine Learning, pages 1378–1387.
    Google ScholarLocate open access versionFindings
  • Dan Moldovan, Sanda Harabagiu, Marius Pasca, Rada Mihalcea, Roxana Girju, Richard Goodrum, and Vasile Rus. 2000. The structure and performance of an open-domain question answering system. In Proceedings of the 38th annual meeting on association for computational linguistics, pages 563–570. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Gonzalo Navarro. 2001. A guided tour to approximate string matching. ACM computing surveys (CSUR), 33(1):31–88.
    Google ScholarLocate open access versionFindings
  • Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. Squad: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383–2392.
    Google ScholarLocate open access versionFindings
  • Minjoon Seo, Aniruddha Kembhavi, Ali Farhadi, and Hannaneh Hajishirzi. 2017a. Bidirectional attention flow for machine comprehension. In International Conference on Learning Representations.
    Google ScholarLocate open access versionFindings
  • Minjoon Seo, Sewon Min, Ali Farhadi, and Hannaneh Hajishirzi. 2017b. Query-reduction networks for question answering. In International Conference on Learning Representations.
    Google ScholarLocate open access versionFindings
  • Yelong Shen, Po-Sen Huang, Jianfeng Gao, and Weizhu Chen. 2017. Reasonet: Learning to stop reading in machine comprehension. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1047–1055. ACM.
    Google ScholarLocate open access versionFindings
  • Steven A Sloman. 1996. The empirical case for two systems of reasoning. Psychological bulletin, 119(1):3.
    Google ScholarLocate open access versionFindings
  • Alon Talmor and Jonathan Berant. 2018. The web as a knowledge-base for answering complex questions. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), volume 1, pages 641–651.
    Google ScholarLocate open access versionFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems, pages 5998–6008.
    Google ScholarLocate open access versionFindings
  • Ellen M Voorhees et al. 1999. The trec-8 question answering track report. In Trec, volume 99, pages 77– 82. Citeseer.
    Google ScholarLocate open access versionFindings
  • Shuohang Wang, Mo Yu, Xiaoxiao Guo, Zhiguo Wang, Tim Klinger, Wei Zhang, Shiyu Chang, Gerry Tesauro, Bowen Zhou, and Jing Jiang. 2018a. r3: Reinforced ranker-reader for open-domain question answering. In Thirty-Second AAAI Conference on Artificial Intelligence.
    Google ScholarLocate open access versionFindings
  • Wei Wang, Ming Yan, and Chen Wu. 2018b. Multigranularity hierarchical attention fusion networks for reading comprehension and question answering. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, pages 1705–1714.
    Google ScholarLocate open access versionFindings
  • Wenhui Wang, Nan Yang, Furu Wei, Baobao Chang, and Ming Zhou. 2017. Gated self-matching networks for reading comprehension and question answering. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, pages 189–198.
    Google ScholarLocate open access versionFindings
  • Yingxu Wang, Dong Liu, and Ying Wang. 2003. Discovering the capacity of human memory. Brain and Mind, 4(2):189–198.
    Google ScholarLocate open access versionFindings
  • Johannes Welbl, Pontus Stenetorp, and Sebastian Riedel. 2018. Constructing datasets for multi-hop reading comprehension across documents. Transactions of the Association of Computational Linguistics, 6:287–302.
    Google ScholarLocate open access versionFindings
  • Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William Cohen, Ruslan Salakhutdinov, and Christopher D Manning. 2018. Hotpotqa: A dataset for diverse, explainable multi-hop question answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2369–2380.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments