AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
Using the popular benchmark TREC Question answering data, we show that the relatively simple attention based neural matching model model can significantly outperform other neural network models that have been used for the question answering task, and is competitive with models th...

aNMM: Ranking Short Answer Texts with Attention-Based Neural Matching Model

Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, (2018)

Cited by: 148|Views2556
EI

Abstract

As an alternative to question answering methods based on feature engineering, deep learning approaches such as convolutional neural networks (CNNs) and Long Short-Term Memory Models (LSTMs) have recently been proposed for semantic matching of questions and answers. To achieve good results, however, these models have been combined with add...More

Code:

Data:

0
Introduction
  • Question answering (QA), which returns exact answers as either short facts or long passages to natural language questions issued by users, is a challenging task and plays a central role in the generation of advanced web search [2, 21].
  • The weakness of the existing studies is that the proposed deep models, either based on CNNs or LSTMs, need to be combined with additional features such as word overlap features and BM25 to perform well
  • Without combining these additional features, their performance is significantly worse than the results obtained by the state-of-the-art methods based on linguistic feature engineering [32].
  • This led them to propose the following research questions: RQ1 Without combining additional features, could the authors build deep learning models that can achieve comparable or even better performance than methods using feature engineering ?
Highlights
  • Question answering (QA), which returns exact answers as either short facts or long passages to natural language questions issued by users, is a challenging task and plays a central role in the generation of advanced web search [2, 21]
  • If we combine our model with a simple additional feature like Query Likelihood (QL), our method can achieve the state-of-the-art performance among current existing methods for ranking answers under multiple metrics
  • We summarize our observations as follows: (1) Both attention based neural matching model (aNMM)-1 and aNMM2 show significant improvements for Mean Average Precision (MAP) and Mean Reciprocal Rank (MRR) on TRAIN and TRAIN-ALL data sets comparing with previous deep learning methods
  • For MRR, we can observe similar significant improvements of aNMM1. These results show that with the value-shared weight scheme instead of the position-shared weight scheme in convolutional neural networks (CNNs) and term importance learning with question attention network, aNMM can predict ranking scores with much higher accuracy comparing with previous deep learning models for ranking answers
  • Our second experimental setting is to address RQ2 proposed in Section 1, where we ask whether our model can outperform the state-of-the-art performance achieved by CNN [34, 18] and Long Short-Term Memory Models (LSTMs) [25] for answer ranking when combining additional features
  • With a simple additional feature, our method can achieve the new state-of-the-art performance among current existing methods
Methods
  • Wang et al (2007) [27] Heilman and Smith (2010) [5] Wang and Manning (2010) [26] Yao et al (2013) [31] Severyn et al (2013) [17] Yih et al (2013) [32] aNMM-2 aNMM-1

    MRR 0.6852 0.6917 0.6951 0.7477 0.7358 0.7700 0.7969 0.7995 be trained much faster with good prediction accuracy.
  • Comparing the results of aNMM1 with the strongest baseline by Yih et al [32] based on enhanced lexical semantic models, aNMM-1 achieves 4.13% gain for MAP and 3.83% gain for MRR
  • These results show that it is possible to build a uniform deep learning model such that it can achieve better performance than methods using feature engineering.
  • Combining aNMM score with features like IDF weighted word overlap features and BM25 may not increase the performance of aNMM by a large margin as the case in related research works [34, 18, 25]
Results
  • The authors give some qualitative analysis and visualization of the model learning results.
  • For MRR, the authors can observe similar significant improvements of aNMM1
  • These results show that with the value-shared weight scheme instead of the position-shared weight scheme in CNN and term importance learning with question attention network, aNMM can predict ranking scores with much higher accuracy comparing with previous deep learning models for ranking answers.
  • Choosing suitable number of bins by optimizing hyperparameter on validation data can help improve the performance of aNMM
Conclusion
  • The authors propose an attention based neural matching model for ranking short answer text.
  • Unlike previous methods including CNN as in [34, 18] and LSTM as in [25], which only show inferior results without combining additional features, the model can achieve better performance than the state-ofart method using linguistic feature engineering without additional features.
  • The authors will study other deep learning architectures for answer ranking and extend the work to include nonfactoid question answering data sets
Tables
  • Table1: The statistics of the TREC QA data set
  • Table2: Examples of learned question term importance by aNMM-1
  • Table3: The comparision of aNMM-1/aNMM-2 with aNMM-IDF which is a degenerate version of our model where we use IDF to directly replace the output of question attention network
  • Table4: Results of TREC QA on TRAIN and TRAIN-ALL without combining additional features (Compare with deep learning methods)
  • Table5: Results of TREC QA on TRAIN-ALL without combining additional features (Compare with methods using feature engineering)
  • Table6: Results of TREC QA on TRAIN and TRAIN-ALL with combining additional features
  • Table7: Overview of previously published systems on the QA answer ranking task. All reported results are from the best setting of each model trained on TRAIN-ALL data
Download tables as Excel
Related work
  • Our work is related to several research areas, including deep learning models for text matching, factoid question answering, answer ranking in CQA and answer passage / sentence retrieval.

    Deep Learning Models for Text Matching. Recently there have been many deep learning models proposed for text matching and ranking. Such deep learning models include DSSM [7], CDSSM [4, 19], ARC-I/ARC-II[6] , DCNN [10], DeepMatch [13], MultiGranCNN [33] and MatchPyramid [15]. DSSM performs a non-linear projection to map the query and the documents to a common semantic space. The neural network models are trained using clickthrough data such that the conditional likelihood of the clicked document given the query is maximized. DeepMatch uses a topic model to construct the interactions between two texts and then makes different levels of abstractions with a deep architecture to model the relationships between topics. ARC-I and ARC-II are two different architectures proposed by Hu et al [6] for matching natural language sentences. ARC-I firstly finds the representation of each sentence and then compares the representations of the two sentences with a multi-layer perceptron (MLP). The drawback of ARC-I is that it defers the interaction between two sentences until their individual representation matures in the convolution model, and therefore has the risk of losing details, which could be important for the matching task. On the other hand, ARC-II is built directly on the interaction space between two sentences. Thus ARCII makes two sentences meet before their own high-level representations mature, while still retaining the space for individual development of abstraction of each sentence. Our aNMM architecture adopts a similar design with ARC-II in the QA matching matrix where we build neural networks directly on the interaction of sentence term pairs. However, we adopt value-shared weights instead of position-shared weights as in the CNN used by ARC-II. We also add attention scheme to learn question term importance.
Funding
  • This work was supported in part by the Center for Intelligent Information Retrieval, in part by NSF IIS-1160894, and in part by NSF grant #IIS-1419693
Reference
  • W. B. Croft, D. Metzler, and T. Strohman. Search Engines: Information Retrieval in Practice. Addison-Wesley Publishing Company, USA, 1st edition, 2009.
    Google ScholarFindings
  • O. Etzioni. Search needs a shake-up. Nature, 476(7358):25–26, Aug. 2011.
    Google ScholarLocate open access versionFindings
  • Y. Ganjisaffar, R. Caruana, and C. Lopes. Bagging gradient-boosted trees for high precision, low variance ranking models. In SIGIR ’11, pages 85–94, New York, NY, USA, 2011. ACM.
    Google ScholarFindings
  • J. Gao, P. Pantel, M. Gamon, X. He, and L. Deng. Modeling interestingness with deep neural networks. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, pages 2–13, 2014.
    Google ScholarLocate open access versionFindings
  • M. Heilman and N. A. Smith. Tree edit models for recognizing textual entailments, paraphrases, and answers to questions. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, HLT ’10, pages 1011–1019, Stroudsburg, PA, USA, 2010. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • B. Hu, Z. Lu, H. Li, and Q. Chen. Convolutional neural network architectures for matching natural language sentences. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 27, pages 2042–2050. Curran Associates, Inc., 2014.
    Google ScholarLocate open access versionFindings
  • P.-S. Huang, X. He, J. Gao, L. Deng, A. Acero, and L. Heck. Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, CIKM ’13, pages 2333–2338, New York, NY, USA, 2013. ACM.
    Google ScholarLocate open access versionFindings
  • M. Iyyer, J. Boyd-Graber, L. Claudino, R. Socher, and H. Daumé III. A neural network for factoid question answering over paragraphs. In EMNLP ’14, 2014.
    Google ScholarLocate open access versionFindings
  • P. Jansen, M. Surdeanu, and P. Clark. Discourse Complements Lexical Semantics for Non-factoid Answer Reranking. In Proceedings of ACL’14, pages 977–986.
    Google ScholarLocate open access versionFindings
  • N. Kalchbrenner, E. Grefenstette, and P. Blunsom. A convolutional neural network for modelling sentences. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, June 2014.
    Google ScholarLocate open access versionFindings
  • M. Keikha, J. H. Park, and W. B. Croft. Evaluating Answer Passages Using Summarization Measures. In Proceedings of SIGIR’14, 2014.
    Google ScholarLocate open access versionFindings
  • M. Keikha, J. H. Park, W. B. Croft, and M. Sanderson. Retrieving Passages and Finding Answers. In Proceedings of ADCS’14, pages 81–84, 2014.
    Google ScholarLocate open access versionFindings
  • Z. Lu and H. Li. A deep architecture for matching short texts. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 26, pages 1367–1375. Curran Associates, Inc., 2013.
    Google ScholarLocate open access versionFindings
  • T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 26, pages 3111–3119. Curran Associates, Inc., 2013.
    Google ScholarLocate open access versionFindings
  • L. Pang, Y. Lan, J. Guo, J. Xu, S. Wan, and X. Cheng. Text matching as image recognition. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, February 12-17, 2016, Phoenix, Arizona, USA., pages 2793–2799, 2016.
    Google ScholarLocate open access versionFindings
  • X. Qiu and X. Huang. Convolutional neural tensor network architecture for community-based question answering. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, IJCAI 2015, Buenos Aires, Argentina, July 25-31, 2015, pages 1305–1311. AAAI Press, 2015.
    Google ScholarLocate open access versionFindings
  • A. Severyn and A. Moschitti. Automatic feature engineering for answer selection and extraction. In EMNLP ’13, pages 458–467, 2013.
    Google ScholarLocate open access versionFindings
  • A. Severyn and A. Moschitti. Learning to rank short text pairs with convolutional deep neural networks. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’15, pages 373–382, New York, NY, USA, 2015. ACM.
    Google ScholarLocate open access versionFindings
  • Y. Shen, X. He, J. Gao, L. Deng, and G. Mesnil. A latent semantic model with convolutional-pooling structure for information retrieval. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, CIKM 2014, Shanghai, China, November 3-7, 2014, pages 101–110, 2014.
    Google ScholarLocate open access versionFindings
  • M. Su and M. Basu. Gating improves neural network performance. In Neural Networks, 2001. Proceedings. IJCNN ’01. International Joint Conference on, volume 3, pages 2159–2164 vol.3, 2001.
    Google ScholarLocate open access versionFindings
  • H. Sun, H. Ma, W.-t. Yih, C.-T. Tsai, J. Liu, and M.-W. Chang. Open domain question answering via semantic enrichment. In Proceedings of the 24th International Conference on World Wide Web, WWW ’15, pages 1045–1055, New York, NY, USA, 2015. ACM.
    Google ScholarLocate open access versionFindings
  • M. Surdeanu, M. Ciaramita, and H. Zaragoza. Learning to rank answers on large online QA collections. In ACL ’08, pages 719–727, 2008.
    Google ScholarLocate open access versionFindings
  • M. Surdeanu, M. Ciaramita, and H. Zaragoza. Learning to rank answers to non-factoid questions from web collections. Comput. Linguist., 37(2):351–383, June 2011.
    Google ScholarLocate open access versionFindings
  • K. Tymoshenko and A. Moschitti. Assessing the impact of syntactic and semantic structures for answer passages reranking. In CIKM ’15, pages 1451–1460, New York, NY, USA, 2015. ACM.
    Google ScholarFindings
  • D. Wang and E. Nyberg. A long short-term memory model for answer sentence selection in question answering. In ACL ’15, pages 707–712, 2015.
    Google ScholarLocate open access versionFindings
  • M. Wang and C. D. Manning. Probabilistic tree-edit models with structured latent variables for textual entailment and question answering. In Proceedings of the 23rd International Conference on Computational Linguistics, COLING ’10, pages 1164–1172, Stroudsburg, PA, USA, 2010. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • M. Wang, N. A. Smith, and T. Mitamura. What is the Jeopardy model? a quasi-synchronous grammar for QA. In Proceedings of the EMNLP-CoNLL, pages 22–32, Prague, Czech Republic, 2007. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Q. Wu, C. J. Burges, K. M. Svore, and J. Gao. Adapting boosting for information retrieval measures. Inf. Retr., 13(3):254–270, June 2010.
    Google ScholarLocate open access versionFindings
  • X. Xue, J. Jeon, and W. B. Croft. Retrieval Models for Question and Answer Archives. In Proceedings of SIGIR’08, pages 475–482, 2008.
    Google ScholarLocate open access versionFindings
  • L. Yang, Q. Ai, D. Spina, R. Chen, L. Pang, W. B. Croft, J. Guo, and F. Scholer. Beyond factoid QA: effective methods for non-factoid answer sentence retrieval. In Advances in Information Retrieval - 38th European Conference on IR Research, ECIR 2016, Padua, Italy, March 20-23, 2016. Proceedings, pages 115–128, 2016.
    Google ScholarLocate open access versionFindings
  • X. Yao, B. V. Durme, C. Callison-Burch, and P. Clark. Answer extraction as sequence tagging with tree edit distance. In Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings, June 9-14, 2013, Westin Peachtree Plaza Hotel, Atlanta, Georgia, USA, pages 858–867, 2013.
    Google ScholarLocate open access versionFindings
  • W.-t. Yih, M.-W. Chang, C. Meek, and A. Pastusiak. Question answering using enhanced lexical semantic models. In ACL ’13, pages 1744–1753, Sofia, Bulgaria, August 2013. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • W. Yin and H. Schütze. Multigrancnn: An architecture for general matching of text chunks on multiple levels of granularity. In ACL ’15, pages 63–73, Beijing, China, July 2015. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • L. Yu, K. M. Hermann, P. Blunsom, and S. Pulman. Deep Learning for Answer Sentence Selection. In NIPS Deep Learning Workshop, Dec. 2014.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科