Neural Symbolic Reader: Scalable Integration of Distributed and Symbolic Representations for Reading Comprehension

    ICLR, 2020.

    Cited by: 1|Bibtex|Views53|Links
    EI
    Keywords:
    neural symbolic reading comprehension question answering
    Wei bo:
    We presented the Neural Symbolic Reader as a scalable integration of distributed representations and symbolic operations for reading comprehension

    Abstract:

    Integrating distributed representations with symbolic operations is essential for reading comprehension requiring complex reasoning, such as counting, sorting and arithmetics, but most existing approaches are hard to scale to more domains or more complex reasoning. In this work, we propose the Neural Symbolic Reader (NeRd), which includes...More

    Code:

    Data:

    Introduction
    • Deep neural networks have achieved remarkable successes in natural language processing recently.
    • E.g., DROP (Dua et al, 2019) and MathQA (Amini et al, 2019), are collected to examine the capability of both language understanding and discrete reasoning, where the direct application of the state-of-the-art pre-trained language models, such as BERT or QANet (Yu et al, 2018), achieves very low accuracy.
    • Integrating neural networks with symbolic reasoning is crucial for solving those new tasks
    Highlights
    • Deep neural networks have achieved remarkable successes in natural language processing recently
    • We propose the Neural Symbolic Reader (NeRd) for reading comprehension, which consists of (1) a reader that encodes passages and questions into vector representations; and (2) a programmer that generates programs, which are executed to produce answers
    • 2.2 DOMAIN SPECIFIC LANGUAGE we introduce our domain specific language (DSL), which is used to interpret the tokens generated by the programmer component as an executable program
    • Besides the setting where all the ground truth programs are provided during training, we evaluate the weak supervision setting on MathQA
    • We presented the Neural Symbolic Reader (NeRd) as a scalable integration of distributed representations and symbolic operations for reading comprehension
    • Neural Symbolic Reader architecture consists of a reader that encodes text into vector representation, and a programmer that generates programs, which will be executed to produce the answer
    Results
    • Table 4 summarizes our main evaluation results on DROP dataset, with 9.5K samples in the development set and 9.6K hidden samples in the test set.
    • Notice that in (Andor et al, 2019), they train their BERT-Calc model on CoQA (Reddy et al, 2019) in addition to DROP, and they evaluate an ensemble with 6 models, resulting in the exact match of 78.14, and F1 score of 81.78 on test set.
    • We can see that without additional training data and ensembling, NeRd still beats their single model, and the performance is on par with their ensemble model
    Conclusion
    • We presented the Neural Symbolic Reader (NeRd) as a scalable integration of distributed representations and symbolic operations for reading comprehension.
    • By introducing the span selection operators, our domain-agnostic architecture can generate compositional programs to perform complex reasoning over text for different domains by only extending the set of operators.
    • In our evaluation, using the same model architecture without any change, NeRd significantly surpasses previous state-of-the-arts on two challenging reading comprehension tasks, DROP and MathQA.
    • We hope to motivate future works to introduce complex reasoning to other domains or other tasks in NLP, e.g., machine translation and language modeling, by extending the set of operators
    Summary
    • Introduction:

      Deep neural networks have achieved remarkable successes in natural language processing recently.
    • E.g., DROP (Dua et al, 2019) and MathQA (Amini et al, 2019), are collected to examine the capability of both language understanding and discrete reasoning, where the direct application of the state-of-the-art pre-trained language models, such as BERT or QANet (Yu et al, 2018), achieves very low accuracy.
    • Integrating neural networks with symbolic reasoning is crucial for solving those new tasks
    • Results:

      Table 4 summarizes our main evaluation results on DROP dataset, with 9.5K samples in the development set and 9.6K hidden samples in the test set.
    • Notice that in (Andor et al, 2019), they train their BERT-Calc model on CoQA (Reddy et al, 2019) in addition to DROP, and they evaluate an ensemble with 6 models, resulting in the exact match of 78.14, and F1 score of 81.78 on test set.
    • We can see that without additional training data and ensembling, NeRd still beats their single model, and the performance is on par with their ensemble model
    • Conclusion:

      We presented the Neural Symbolic Reader (NeRd) as a scalable integration of distributed representations and symbolic operations for reading comprehension.
    • By introducing the span selection operators, our domain-agnostic architecture can generate compositional programs to perform complex reasoning over text for different domains by only extending the set of operators.
    • In our evaluation, using the same model architecture without any change, NeRd significantly surpasses previous state-of-the-arts on two challenging reading comprehension tasks, DROP and MathQA.
    • We hope to motivate future works to introduce complex reasoning to other domains or other tasks in NLP, e.g., machine translation and language modeling, by extending the set of operators
    Tables
    • Table1: Overview of our domain-specific language. See Table 2 for the sample usage
    • Table2: Examples of correct predictions on DROP development set
    • Table3: An example in MathQA dataset
    • Table4: Results on DROP dataset. On the development set, we present the mean and standard error of 10 NeRd models, and the test result of a single model. For all models, the performance breakdown of different question types is on the development set. Note that the training data of BERT-Calc model (<a class="ref-link" id="cAndor_et+al_2019_a" href="#rAndor_et+al_2019_a">Andor et al, 2019</a>) for test set evaluation is augmented with CoQA (<a class="ref-link" id="cReddy_et+al_2019_a" href="#rReddy_et+al_2019_a">Reddy et al, 2019</a>)
    • Table5: Results of counting and sorting questions on DROP development set, where we compare variants of NeRd with and without the corresponding operations. (a): counting; (b): sorting. For each setting, we present the best results on development set
    • Table6: Examples of counting and sorting questions on DROP development set, where NeRd with the corresponding operations gives the correct predictions, while the variants without them do not. (a): counting; (b): sorting
    • Table7: Results of different training algorithms on DROP development set. For each setting, we present the best results on the development set
    • Table8: Results on MathQA test set, with NeRd and two variants: (1) no pre-training; (2) using 20% of the program annotations in training
    • Table9: Some samples in DROP training set with the wrong annotations, which are discarded by NeRd because none of the annotated programs passes the threshold of our training algorithm
    • Table10: Examples of wrong predictions on DROP dev set
    Download tables as Excel
    Related work
    • Reading comprehension and question answering have recently attracted a lot of attention from the NLP community. A plethora of datasets have been available to evaluate different capabilities of

      Passage ...with field goals of 38 and 36 yards by kicker Dan Carpenter ... followed by a 43-yard field goal by Carpenter ... 52yard field goal ...

      ... with the five most common surgeries being breast augmentation, liposuction, breast reduction, eyelid surgery and abdominoplasty ...

      Question & Prediction

      Question: How many total field goals were kicked in the game? Predicted Program: COUNT(

      PASSAGE SPAN(75,75), PASSAGE SPAN(77,78), PASSAGE SPAN(133,135), PASSAGE SPAN(315,317)) Result: COUNT( ‘38’,‘36 yards’, ‘43-yard’,‘52-yard’) = 4 Predicted Program (-counting): COUNT5 Result: 5

      Question: How many of the five most common procedures are not done on the breasts? Predicted Program: COUNT(

      PASSAGE SPAN(132,135), PASSAGE SPAN(140,142), PASSAGE SPAN(144,149)) Result: COUNT( ‘liposuction’, ‘eyelid surgery’, ‘abdominoplasty’) = 3 Predicted Program (-counting): COUNT4 Result: 4

      ...In the third quarter, Arizona’s deficit continued to climb as Cassel completed a 76-yard touchdown pass to wide receiver Randy Moss ... quarterback Matt Leinart completed a 78-yard touchdown pass to wide receiver Larry Fitzgerald ...

      ... Carney got a 38-yard field goal ... with Carney connecting on a 39yard field goal ...

      Question: Who threw the longest touchdown pass? Predicted Program: ARGMAX(
    Reference
    • Aida Amini, Saadia Gabriel, Peter Lin, Rik Koncel-Kedziorski, Yejin Choi, and Hannaneh Hajishirzi. Mathqa: Towards interpretable math word problem solving with operation-based formalisms. arXiv preprint arXiv:1905.13319, 2019.
      Findings
    • Daniel Andor, Luheng He, Kenton Lee, and Emily Pitler. Giving bert a calculator: Finding operations and arguments with reading comprehension. arXiv preprint arXiv:1909.00109, 2019.
      Findings
    • Jacob Andreas, Marcus Rohrbach, Trevor Darrell, and Dan Klein. Learning to compose neural networks for question answering. arXiv:1601.01705, 2016.
      Findings
    • Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. arXiv:1409.0473, 2014.
      Findings
    • Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. Semantic parsing on freebase from question-answer pairs. EMNLP, 2(5):6, 2013.
      Google ScholarLocate open access versionFindings
    • Rudy Bunel, Matthew Hausknecht, Jacob Devlin, Rishabh Singh, and Pushmeet Kohli. Leveraging grammar and reinforcement learning for neural program synthesis. In International Conference on Learning Representations, 2018.
      Google ScholarLocate open access versionFindings
    • Jonathon Cai, Richard Shin, and Dawn Song. Making neural programming architectures generalize via recursion. ICLR, 2017.
      Google ScholarLocate open access versionFindings
    • Xavier Carreras and Lluıs Marquez. Introduction to the conll-2004 shared task: Semantic role labeling. In Proceedings of the Eighth Conference on Computational Natural Language Learning (CoNLL-2004) at HLT-NAACL 2004, pp. 89–97, 2004.
      Google ScholarLocate open access versionFindings
    • Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes. Reading wikipedia to answer opendomain questions. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers, pp. 1870–1879, 2017.
      Google ScholarLocate open access versionFindings
    • Pradeep Dasigi, Matt Gardner, Shikhar Murty, Luke Zettlemoyer, and Eduard Hovy. Iterative search for weakly supervised semantic parsing. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 2669–2680, 2019.
      Google ScholarLocate open access versionFindings
    • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186, 2019.
      Google ScholarLocate open access versionFindings
    • Honghua Dong, Jiayuan Mao, Tian Lin, Chong Wang, Lihong Li, and Denny Zhou. Neural logic machines. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, 2019.
      Google ScholarLocate open access versionFindings
    • Li Dong and Mirella Lapata. Language to logical form with neural attention. ACL, 2016.
      Google ScholarLocate open access versionFindings
    • Dheeru Dua, Yizhong Wang, Pradeep Dasigi, Gabriel Stanovsky, Sameer Singh, and Matt Gardner. DROP: A reading comprehension benchmark requiring discrete reasoning over paragraphs. In Proc. of NAACL, 2019.
      Google ScholarLocate open access versionFindings
    • Alex Graves, Greg Wayne, and Ivo Danihelka. Neural turing machines. CoRR, abs/1410.5401, 2014. URL http://arxiv.org/abs/1410.5401.
      Findings
    • Kelvin Guu, Panupong Pasupat, Evan Liu, and Percy Liang. From language to programs: Bridging reinforcement learning and maximum marginal likelihood. ACL, 2017.
      Google ScholarLocate open access versionFindings
    • Sepp Hochreiter and Jurgen Schmidhuber. Long short-term memory. Neural Comput., 1997.
      Google ScholarLocate open access versionFindings
    • Minghao Hu, Yuxing Peng, Zhen Huang, and Dongsheng Li. A multi-type multi-span network for reading comprehension that requires discrete reasoning. arXiv preprint arXiv:1908.05514, 2019.
      Findings
    • Robin Jia and Percy Liang. Data recombination for neural semantic parsing. ACL, 2016.
      Google ScholarLocate open access versionFindings
    • Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Judy Hoffman, Li Fei-Fei, C Lawrence Zitnick, and Ross Girshick. Inferring and executing programs for visual reasoning. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2989–2998, 2017.
      Google ScholarLocate open access versionFindings
    • Łukasz Kaiser and Ilya Sutskever. Neural gpus learn algorithms. arXiv:1511.08228, 2015.
      Findings
    • Guillaume Klein, Yoon Kim, Yuntian Deng, Vincent Nguyen, Jean Senellart, and Alexander M Rush. Opennmt: Neural machine translation toolkit. arXiv preprint arXiv:1805.11462, 2018.
      Findings
    • Jayant Krishnamurthy, Pradeep Dasigi, and Matt Gardner. Neural semantic parsing with type constraints for semi-structured tables. EMNLP, 2017.
      Google ScholarLocate open access versionFindings
    • Chen Liang, Jonathan Berant, Quoc Le, Kenneth D. Forbus, and Ni Lao. Neural symbolic machines: Learning semantic parsers on freebase with weak supervision. ACL, 2017.
      Google ScholarLocate open access versionFindings
    • Chen Liang, Mohammad Norouzi, Jonathan Berant, Quoc V. Le, and Ni Lao. Memory augmented policy optimization for program synthesis and semantic parsing. In NeurIPS, pp. 10015–10027, 2018.
      Google ScholarLocate open access versionFindings
    • Wang Ling, Dani Yogatama, Chris Dyer, and Phil Blunsom. Program induction by rationale generation: Learning to solve and explain algebraic word problems. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 158–167, 2017.
      Google ScholarLocate open access versionFindings
    • Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. Roberta: A robustly optimized BERT pretraining approach. CoRR, abs/1907.11692, 2019.
      Findings
    • Christopher Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven Bethard, and David McClosky. The stanford corenlp natural language processing toolkit. In Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations, pp. 55–60, 2014.
      Google ScholarLocate open access versionFindings
    • Jiayuan Mao, Chuang Gan, Pushmeet Kohli, Joshua B. Tenenbaum, and Jiajun Wu. The neurosymbolic concept learner: Interpreting scenes, words, and sentences from natural supervision. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, 2019.
      Google ScholarLocate open access versionFindings
    • Sewon Min, Danqi Chen, Hannaneh Hajishirzi, and Luke Zettlemoyer. A discrete hard em approach for weakly supervised question answering. arXiv preprint arXiv:1909.04849, 2019.
      Findings
    • Arvind Neelakantan, Quoc V Le, Martin Abadi, Andrew McCallum, and Dario Amodei. Learning a natural language interface with neural programmer. arXiv preprint arXiv:1611.08945, 2016.
      Findings
    • Panupong Pasupat and Percy Liang. Compositional semantic parsing on semi-structured tables. ACL, 2015.
      Google ScholarLocate open access versionFindings
    • Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 1 (Long Papers), pp. 2227–2237, 2018.
      Google ScholarLocate open access versionFindings
    • Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. Squad: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1-4, 2016, pp. 2383–2392, 2016.
      Google ScholarLocate open access versionFindings
    • Siva Reddy, Danqi Chen, and Christopher D. Manning. Coqa: A conversational question answering challenge. TACL, 7:249–266, 2019.
      Google ScholarLocate open access versionFindings
    • Scott Reed and Nando de Freitas. Neural programmer-interpreters. ICLR, 2016.
      Google ScholarLocate open access versionFindings
    • Azriel Rosenfeld and Mark Thurston. Edge and curve detection for visual scene analysis. IEEE Transactions on computers, (5):562–569, 1971.
      Google ScholarLocate open access versionFindings
    • Min Joon Seo, Aniruddha Kembhavi, Ali Farhadi, and Hannaneh Hajishirzi. Bidirectional attention flow for machine comprehension. In ICLR, 2017.
      Google ScholarLocate open access versionFindings
    • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in neural information processing systems, pp. 5998–6008, 2017.
      Google ScholarLocate open access versionFindings
    • Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. Pointer networks. NIPS, 2015.
      Google ScholarLocate open access versionFindings
    • Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, 2019.
      Google ScholarLocate open access versionFindings
    • Wenhui Wang, Nan Yang, Furu Wei, Baobao Chang, and Ming Zhou. Gated self-matching networks for reading comprehension and question answering. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers, pp. 189–198, 2017.
      Google ScholarLocate open access versionFindings
    • Caiming Xiong, Victor Zhong, and Richard Socher. Dynamic coattention networks for question answering. CoRR, abs/1611.01604, 2016. URL http://arxiv.org/abs/1611.01604.
      Findings
    • Zhilin Yang, Zihang Dai, Yiming Yang, Jaime G. Carbonell, Ruslan Salakhutdinov, and Quoc V. Le. Xlnet: Generalized autoregressive pretraining for language understanding. CoRR, abs/1906.08237, 2019.
      Findings
    • Adams Wei Yu, David Dohan, Minh-Thang Luong, Rui Zhao, Kai Chen, Mohammad Norouzi, and Quoc V. Le. Qanet: Combining local convolution with global self-attention for reading comprehension. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, 2018.
      Google ScholarLocate open access versionFindings
    • Victor Zhong, Caiming Xiong, and Richard Socher. Seq2sql: Generating structured queries from natural language using reinforcement learning. arXiv:1709.00103, 2017.
      Findings
    • We preprocess the input passages and questions in a similar way as the input preprocessing of DROP dataset described in (Andor et al., 2019). Specifically, to facilitate the usage of BERT, we split up the documents longer than L = 512 tokens. Meanwhile, we extract the locations and values of the numbers, so that they can be retrieved via indices when applying numerical operators. We apply the same input preprocessing on MathQA as well.
      Google ScholarLocate open access versionFindings
    • The reader implementation is largely the same as (Andor et al., 2019). Specifically, for the embedding representation of the reader component, we feed the question and passage jointly into BERT, which provides the output vector of each input token ti as ei. Unless otherwise specified, the encoder is initialized with the uncased whole-word-masking version of BERTLARGE. We denote the size of ei as H0.
      Google ScholarLocate open access versionFindings
    • This formulation is similar to the attention mechanism introduced in prior work (Bahdanau et al., 2014). Correspondingly, we compute the attention vector of the passage tokens attp, and the attention vector of the question tokens attq.
      Google ScholarLocate open access versionFindings
    • Where wT i denotes the weight of selecting the i-th token as the next program token. This design is similar to the pointer network (Vinyals et al., 2015).
      Google ScholarFindings
    Your rating :
    0

     

    Tags
    Comments