A Deep Relevance Matching Model for Ad-hoc Retrieval

Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, (2017): 55-64

Cited by: 515|Views355
EI

Abstract

In recent years, deep neural networks have led to exciting breakthroughs in speech recognition, computer vision, and natural language processing (NLP) tasks. However, there have been few positive results of deep models on ad-hoc retrieval tasks. This is partially due to the fact that many important characteristics of the ad-hoc retrieval ...More

Code:

Data:

0
Introduction
  • Machine learning methods have been successfully applied to information retrieval (IR) in recent years.
  • Deep neural networks, as a representation learning method, are able to discover from the training data the hidden structures and features at different levels of abstraction that are useful for the tasks.
  • Deep models have been applied to a variety of applications in computer vision [16], speech recognition [10] and NLP [25, 17], and have yielded significant performance improvements.
  • There have been few positive results of deep models on IR tasks, especially ad-hoc retrieval tasks, until now
Highlights
  • Machine learning methods have been successfully applied to information retrieval (IR) in recent years
  • We show that most existing deep matching models are designed for semantic matching rather than relevance matching
  • If we directly apply these deep matching models on some benchmark retrieval collections, e.g. TREC collections, we find relatively poor performance compared to traditional ranking models, such as the language model [31] and BM25 [22]
  • Based on the above analysis, we propose a novel deep matching model designed for relevance matching in ad-hoc retrieval by explicitly addressing the three factors described in Section 3
  • Experimental results on two representative benchmark datasets show that our model can significantly outperform traditional retrieval models as well as state-of-the-art deep matching models
Methods
  • The authors conduct experiments to demonstrate the effectiveness of the proposed model. 5.1 Data Sets

    To conduct experiments, the authors use two TREC collections, Robust and ClueWeb-09-Cat-B.
  • Note that ClueWeb-09-Cat-B is filtered to the set of documents with spam scores in the 60th percentile, using the Waterloo Fusion spam scores [3].
  • For both datasets, the authors made use of both the title and the description of each TREC topic in the experiments.
  • Stopword removal is performed on query words during retrieval using the INQUERY stop list [2]
Results
  • Evaluation Methodology

    Given the limited number of queries for each collection, the authors conduct 5-fold cross-validation to minimize over-fitting without reducing the number of learning instances.
  • The final fold in each case is used to evaluate the optimal parameters.
  • This process is repeated 5 times, once for each fold.
  • Mean average precision (MAP) is the optimized metric for all retrieval models.
  • The top-ranked 1, 000 documents are compared using the mean average precision (MAP), normalized discounted cumulative gain at rank 20, and precision at rank 20 (P@20).
  • The top-ranked 1, 000 documents are used for comparison
Conclusion
  • The authors point out that there are significant differences between semantic matching for many NLP tasks and relevance matching for the ad-hoc retrieval task.
  • Many existing deep matching models designed for the semantic matching problem may not fit the ad-hoc retrieval task.
  • The authors may include phrase embeddings so that phrases can be treated as a whole rather than separate terms.
  • In this way, the authors expect the local interactions can better reflect the meaning by using the proper semantic units in language, leading to better retrieval performance
Tables
  • Table1: Statistics of the TREC collections used in this study. The ClueWeb-09-Cat-B collection has been filtered to the set of documents in the 60th percentile of spam scores
  • Table2: Comparison of different retrieval models over the Robust-04 and ClueWeb-09-Cat-B collections. Significant improvement or degradation with respect to QL is indicated (+/-) (p-value ≤ 0.05)
  • Table3: Performance comparison of DRMM over different dimensionality of term embeddings trained by CBOW on the Robust04 collection
Download tables as Excel
Related work
  • By formalizing ad-hoc retrieval as a text matching problem, deep matching models can be applied to this task so that features can be automatically acquired in an end-to-end way. In recent years, a variety of deep matching models have been proposed for the text matching problems. As mentioned before, we can categorize the existing deep matching models into two major types, namely representationfocused models and interaction-focused models. We have described several representative deep matching models in these two classes in previous sections including DSSM, CDSSM, ARC-I, ARC-II and MatchPyramid. Here we will discuss some other related work in this direction.

    In the class of representation-focused models, Qiu et al [21] proposed Convolutional Neural Tensor Network (CNTN) for community-based question answering. The CNTN model is similar to ARC-I, using CNN to build the representations for each piece of texts. The major difference between CNTN and ARC-I is that CNTN employs a tensor layer rather than MLP on top of the two CNNs to compute the matching score between the two pieces of text. In [25], Socher et al proposed an Unfolding Recursive Autoencoder (uRAE) for paraphrase identification. They first employed recursive autoencoders to build the hierarchical compositional text representations based on syntactic trees, and then conducted matching at different levels for the identification task. In [30], Yin et al introduced MultiGranCNN which employs a CNN to obtain hierarchical representations of texts, and then computes the matching score based on the interactions between these multigranular representations.
Funding
  • This work was supported in part by the Center for Intelligent Information Retrieval, in part by the 973 Program of China under Grant No 2014CB340401 and 2013CB329606, in part by the National Natural Science Foundation of China under Grant No 61232010, 61472401, 61425016, and 61203298, and in part by the Youth Innovation Promotion Association CAS under Grant No 20144310 and 2016102
Study subjects and analysis
documents: 1000
Throughout this paper each displayed evaluation statistic is the average of the five fold-level evaluation values. For evaluation, the top-ranked 1, 000 documents are compared using the mean average precision (MAP), normalized discounted cumulative gain at rank 20 (nDCG@20), and precision at rank 20 (P@20). Statistical differences between models are computed using the Fisher randomization test [24] (α = 0.05)

ranked documents: 2000
Note that for all the deep matching models, we adopt a re-ranking strategy for efficient computation. An initial retrieval is performed using the QL model to obtain the top 2, 000 ranked documents. We then use the deep matching models to re-rank these top results

documents: 1000
We then use the deep matching models to re-rank these top results. The top-ranked 1, 000 documents are then used for comparison.

negative samples: 10
the Robust04 and ClueWeb-09-Cat-B collections, respectively. Specifically, we used 10 as the context window size and used 10 negative samples and a subsampling of frequent words with sampling threshold of 10−4 as suggested by Word2Vec4. Each corpus was pre-processed by removing HTML tags and stemming

documents: 1000
Throughout this paper each displayed evaluation statistic is the average of the five fold-level evaluation values. For evaluation, the top-ranked 1, 000 documents are compared using the mean average precision (MAP), normalized discounted cumulative gain at rank 20 (nDCG@20), and precision at rank 20 (P@20). Statistical differences between models are computed using the Fisher randomization test [24] (α = 0.05)

documents: 1000
We then use the deep matching models to re-rank these top results. The top-ranked 1, 000 documents are then used for comparison. 5.4 Retrieval Performance and Analysis

Reference
  • C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In ICML, pages 89–96. ACM, 2005.
    Google ScholarLocate open access versionFindings
  • J. P. Callan, W. B. Croft, and J. Broglio. Trec and tipster experiments with inquery. IPM, 31(3):327–343, 1995.
    Google ScholarLocate open access versionFindings
  • G. V. Cormack, M. D. Smucker, and C. L. Clarke. Efficient and effective spam filtering and re-ranking for large web datasets. Information retrieval, 14(5):441–465, 2011.
    Google ScholarLocate open access versionFindings
  • J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. JMLR, 12:2121–2159, 2011.
    Google ScholarLocate open access versionFindings
  • H. Fang, T. Tao, and C. Zhai. A formal study of information retrieval heuristics. In SIGIR, pages 49–56. ACM, 2004.
    Google ScholarLocate open access versionFindings
  • H. Fang, T. Tao, and C. Zhai. Diagnostic evaluation of information retrieval models. TOIS, 29(2):7, 2011.
    Google ScholarLocate open access versionFindings
  • H. Fang and C. Zhai. Semantic term matching in axiomatic approaches to information retrieval. In SIGIR, pages 115–122. ACM, 2006.
    Google ScholarLocate open access versionFindings
  • J. Gao, P. Pantel, M. Gamon, X. He, L. Deng, and Y. Shen. Modeling interestingness with deep neural networks. EMNLP, October 2014.
    Google ScholarLocate open access versionFindings
  • R. C. S. L. L. Giles. Overfitting in neural nets: Backpropagation, conjugate gradient, and early stopping. In NIPS, volume 13, page 402. MIT Press, 2001.
    Google ScholarLocate open access versionFindings
  • G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. Signal Processing Magazine, IEEE, 29(6):82–97, 2012.
    Google ScholarLocate open access versionFindings
  • B. Hu, Z. Lu, H. Li, and Q. Chen. Convolutional neural network architectures for matching natural language sentences. In NIPS, pages 2042–2050, 2014.
    Google ScholarLocate open access versionFindings
  • P.-S. Huang, X. He, J. Gao, L. Deng, A. Acero, and L. Heck. Learning deep structured semantic models for web search using clickthrough data. In CIKM, pages 2333–2338. ACM, 2013.
    Google ScholarLocate open access versionFindings
  • N. Kalchbrenner, E. Grefenstette, and P. Blunsom. A convolutional neural network for modelling sentences. arXiv preprint arXiv:1404.2188, 2014.
    Findings
  • T. Kenter and M. de Rijke. Short text similarity with word embeddings. In CIKM, pages 1411–1420. ACM, 2015.
    Google ScholarLocate open access versionFindings
  • R. Krovetz. Viewing morphology as an inference process. In SIGIR, pages 191–202. ACM, 1993.
    Google ScholarLocate open access versionFindings
  • Y. LeCun and Y. Bengio. Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks, 3361(10):1995, 1995.
    Google ScholarLocate open access versionFindings
  • Z. Lu and H. Li. A deep architecture for matching short texts. In NIPS, pages 1367–1375, 2013.
    Google ScholarLocate open access versionFindings
  • T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS, pages 3111–3119, 2013.
    Google ScholarLocate open access versionFindings
  • L. Pang, Y. Lan, J. Guo, J. Xu, S. Wan, and X. Cheng. Text matching as image recognition. 2016.
    Google ScholarFindings
  • J. Pennington, R. Socher, and C. D. Manning. Glove: Global vectors for word representation. In EMNLP, pages 1532–1543, 2014.
    Google ScholarLocate open access versionFindings
  • X. Qiu and X. Huang. Convolutional neural tensor network architecture for community-based question answering. In IJCAI, pages 1305–1311, 2015.
    Google ScholarLocate open access versionFindings
  • S. E. Robertson and S. Walker. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In SIGIR, pages 232–241. ACM, 1994.
    Google ScholarLocate open access versionFindings
  • Y. Shen, X. He, J. Gao, L. Deng, and G. Mesnil. Learning semantic representations using convolutional neural networks for web search. In WWW, pages 373–374, 2014.
    Google ScholarLocate open access versionFindings
  • M. D. Smucker, J. Allan, and B. Carterette. A comparison of statistical significance tests for information retrieval evaluation. In CIKM, pages 623–632. ACM, 2007.
    Google ScholarLocate open access versionFindings
  • R. Socher, E. H. Huang, J. Pennin, C. D. Manning, and A. Y. Ng. Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In NIPS, pages 801–809, 2011.
    Google ScholarLocate open access versionFindings
  • S. Wan, Y. Lan, J. Guo, J. Xu, L. Pang, and X. Cheng. A deep architecture for semantic matching with multiple positional sentence representations. arXiv preprint arXiv:1511.08277, 2015.
    Findings
  • S. Wan, Y. Lan, J. Xu, J. Guo, L. Pang, and X. Cheng. Match-srnn: Modeling the recursive matching structure with spatial rnn. In IJCAI, 2016.
    Google ScholarLocate open access versionFindings
  • M. Wang, Z. Lu, H. Li, and Q. Liu. Syntax-based deep matching of short texts. arXiv preprint arXiv:1503.02427, 2015.
    Findings
  • D. R. G. H. R. Williams and G. Hinton. Learning representations by back-propagating errors. Nature, 323:533–536, 1986.
    Google ScholarLocate open access versionFindings
  • W. Yin and H. Schutze. Multigrancnn: An architecture for general matching of text chunks on multiple levels of granularity. In ACL, pages 63–73, 2015.
    Google ScholarLocate open access versionFindings
  • C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In SIGIR, pages 334–342. ACM, 2001.
    Google ScholarLocate open access versionFindings
  • G. Zheng and J. Callan. Learning to reweight terms with distributed representations. In SIGIR, pages 575–584. ACM, 2015.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科