AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We investigate the capacity of this architecture relative to sparse bag-of-words retrieval models and attentional neural networks

Sparse, Dense, and Attentional Representations for Text Retrieval

TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, (2021): 329-345

Cited: 98|Views133
Full Text
Bibtex
Weibo

Abstract

Dual encoders perform retrieval by encoding documents and queries into dense low-dimensional vectors, scoring each document by its inner product with the query. We investigate the capacity of this architecture relative to sparse bag-of-words models and attentional neural networks. Using both theoretical and empirical analysis,we establish...More

Code:

Data:

0
Introduction
  • Retrieving relevant documents is a core task for language technology, and is a component of other applications, such as information extraction (e.g., Narasimhan et al, 2016) and question answering (e.g., Kwok et al, 2001; Voorhees, 2001).
  • BERT-init documents are retrieved using sparse high dimensional query/document representations, and are further reranked with learned neural models (see Mitra and Craswell (2018) for an overview)
  • This two stage approach is powerful and has achieved state-of-the-art results on multiple IR benchmarks (Nogueira and Cho, 2019; Yang et al, 2019; Nogueira et al, 2019a), especially since largescale annotated data has become available for training deep neural models (Dietz et al, 2018; Craswell et al, 2020).
  • One approach to take advantage of neural models while still employing sparse term-based retrieval is to expand the documents with neural models before indexing (Nogueira et al, 2019b) or learn contextual term weights (Dai and Callan, 2020)
Highlights
  • Retrieving relevant documents is a core task for language technology, and is a component of other applications, such as information extraction (e.g., Narasimhan et al, 2016) and question answering (e.g., Kwok et al, 2001; Voorhees, 2001)
  • While classical information retrieval has focused on heuristic weights for sparse bag-of-words representations (Spärck Jones, 1972), more recent work has adopted a two-stage retrieval and ranking pipeline, where a large number (e.g. 1000)
  • Recent history in NLP might suggest that learned dense representations should always outperform sparse features, but this is not necessarily true: as shown in Figure 1, the BM25 model (Robertson et al, 2009) can outperform a dual encoder based on BERT, on longer documents (See § 7). This raises questions about the utility and limitations of dual encoders, and the circumstances in which these powerful models do not yet reach the state-of-the-art. We explore these questions using both theoretical and empirical tools, and propose new architectures that leverage the strengths of dual encoders while avoiding some of their weaknesses
  • We focus on the capacity of the dual encoder model, because capacity limitations impose a strict upper bound on performance, and because they do not depend on details of the training data and learning algorithm
  • We have mentioned research improving the accuracy of retrieval and ranking from a large space throughout the paper
  • The computational demands of large-scale retrieval push us to seek other architectures: cross attention over contextualized embeddings is too slow, but dual encoding over fixed-length vectors may be insufficiently expressive, failing even to match the performance of sparse bag-of-words competitors. We have used both theoretical and empirical techniques to characterize the limitations of fixed-length dual encoders, focusing on the role of document length
Methods
  • The authors' theoretical results relate the dimensionality of compressive dual encoders to their ability to accurately approximate rankings defined by bagof-words representations like BM25.
  • The distribution of natural language texts may have a special structure
  • This in turn could enable precise approximation of sparse bagof-word models with a lower-dimensional compressive dual encoder.
  • Dual encoders can introduce trained distributed representations of texts, better equipped to capture graded notions of semantic similarity
  • If they can’t make the distinctions that sparse models make, they could suffer a performance ceiling
Results
  • The state of the art prior work follows the twostage retrieval and reranking approach, where an efficient first-stage system retrieves a list of candidates from the document collection, and a second stage more expensive model such as cross-attention BERT reranks the candidates.
  • The authors' focus is on improving the first efficient retrieval stage, and the authors compare to prior works in two settings: Retrieval, top part of the Table, where only first-stage efficient retrieval systems are used and Reranking, bottom part of the Table, where more expensive second-stage models are employed to re-rank candidates.
  • DeepCT-Index produces term weights that can be stored in an ordinary inverted index for first-stage passage retrieval. 4) IDST is a two-stage cascade ranking pipeline proposed by Yan et al (2020) which used both document expansion and crossattention ensemble reranking with tailored BERT model pre-training. 5) Leaderboard is the best reported development set score on the MS MARCOpassage leaderboard 14
Conclusion
  • Transformers perform well on an unreasonable range of problems in natural language processing.
  • The computational demands of large-scale retrieval push them to seek other architectures: cross attention over contextualized embeddings is too slow, but dual encoding over fixed-length vectors may be insufficiently expressive, failing even to match the performance of sparse bag-of-words competitors
  • The authors have used both theoretical and empirical techniques to characterize the limitations of fixed-length dual encoders, focusing on the role of document length.
Tables
  • Table1: Short answer exact match on the Natural Questions open-domain test set for retrieval models over collections with varying document length
  • Table2: Results on MS MARCO-Passage (MSPassage), MS MARCO-Document (MS-Doc) and TREC-CAR datasets. We report MRR@10 and MAP@1000 to align with prior work. For the MS MARCO datasets, results are on the development set; the TREC-CAR results are on the test set
  • Table3: MRR@10 when reranking at different depth for the MS MARCO passage and document tasks
Download tables as Excel
Related work
  • We have mentioned research improving the accuracy of retrieval and ranking from a large space throughout the paper. Here we focus on prior works related to our research questions on the capacity of dense dual encoder representations relative to sparse high-dimensional bag-of-words ones.

    A number of other works relate to the general problem of recovering bag-of-words representations from dense encodings. For example, the literature from compressive sensing shows that it is possible to recover a bag of words vector x from the projection Ax for suitable A. Bounds for the sufficient dimensionality of isotropic Gaussian projections (Candes and Tao, 2005; Arora et al, 2018) are a factor of T log v worse than the bound described in § 3, but this is unsurprising because the task of recovering bags-of-words from a compressed measurement is strictly harder than recovering inner products.

    Subramani et al (2019) ask whether it is possible to exactly recover sentences (token sequences) from pretrained decoders, using vector embeddings that are added as a bias to the decoder hidden state. Because their decoding model is more expressive (and thus more computationally intensive) than inner product retrieval, the theoretical bounds derived here do not apply. Nonetheless, Subramani et al empirically observe a similar dependence between sentence length and embedding size. Wieting and Kiela (2019) represent sentences as bags of random projections, finding that high-dimensional projections (k = 4096) perform nearly as well as trained encoding models such as SkipThought (Kiros et al, 2015) and InferSent (Conneau et al, 2017). These empirical results may provide further empirical support for the hypothesis that bag-of-words vectors from real text are “hard to embed” in the sense of Larsen and Nelson (2017). Our contribution is to systematically explore the relationship between document length and encoding dimension, focusing on the case of exact inner product-based retrieval. Approximate retrieval (Indyk and Motwani, 1998; Har-Peled et al, 2012) is often necessary in practice. We leave the combination of representation learning and approximate retrieval for future work.
Funding
  • Using the MS MARCO document retrieval dataset (see § 9) for data processing details, we evaluate the ability of Rademacher random projections to achieve accuracy of at least 95% on pairwise rankings (q, d1, d2), with respect to both boolean (Figure 2) and BM25 sparse representations (Figure 3)
  • BM25-bi achieves over 90% accuracy across document collections for this task
  • We have mentioned research improving the accuracy of retrieval and ranking from a large space throughout the paper
Study subjects and analysis
random samples: 100
6.2 Learning and Inference. For the experiments in § 7 and § 8, all trained models are initialized from pre-trained BERT-base, and all parameters are fine-tuned using a crossentropy loss with 7 sampled negatives from a precomputed 200-document list and additional inbatch negatives (with a total number of 1024 candidates in a batch); the pre-computed candidates include 100 top neighbors from BM25 and 100 random samples. This is similar to the method by Lee et al (2019), but with additional fixed candidates, also used in concurrent work (Karpukhin et al, 2020)

QA pairs: 87925
We follow the setup in Lee et al (2019). There are 87, 925 QA pairs in training and 3, 610 QA pairs in the test set. We hold out a subset of training for development

random samples: 100
For document retrieval, a passage is correct for a query x if it contains a string that matches exactly10 one of the annotator-provided short answers for the question. We form a reranking task by considering the top 100 results from BM25-uni and another 100 random samples, and also consider the full retrieval setting. To define candidates for reranking and model training, BM25-uni is used here instead of BM25-bi, because it is the stronger sparse retrieval model for this task

top scoring documents: 100
Unlike the reranking setting, only higher-dimensional DE-BERT models outperform BM25. We also explore an efficient BM25-uni-neural model hybrid: each system retrieves 100 top scoring documents and the documents in the union of 100-best lists are scored using a linear combination of the two systems’ scores.11. The hybrid models offer large improvements over their components, capturing both precise word overlap and semantic similarity

documents: 100
We train a short answer model for a retrieval system M in a pipeline fashion, by training a BERT-based reader model given the fixed and separately trained retrieval system M. Given a retriever M , a training set of queries x paired with short answer string answers a(x), and a document collection Dl, we generate training examples for the reader as follows: For each training query x we use M to retrieve the top 100 documents of length up to l, and group these into larger segments (blocks) of length up to 400, which are used as inputs to the reading comprehension model; the original passage boundaries are indicated by a special token. The reader uses the SQuAD2.0 BERT-base architecture to select an answer span or a NULL answer

random samples: 100
Min. k for Rademacher embeddings to approximate BM25 pairwise rankings with average error < .05. left) shows test set results on reranking, where models need to select one of 200 candidate passages: 100 top results from BM25-bi and 100 random samples. It is interesting to see recall@1 recall@1. Results on the containing passage Inverse Cloze Task. Left: Reranking with BM25-uni, BM25-bi, and trained models; Right: Accuracy of retrieving passage containing a query from approximately three million candidates. BERT-based dense retrieval models compared to BM25,as maximum passage length varies. right) shows results for the much more challenging task of retrieval from three million candidates. For the latter setting, we only evaluate models that can efficiently retrieve nearest neighbors from such a large set. We see similar behavior to the reranking setting, with the multivector methods matching BM25-uni for all but the longest documents. left) shows heldout set results on the reranking task. To fairly compare systems that operate over document collections of different-sized passages, we allow each model to select approximately the same number of tokens (400) and evaluate on whether an answer is contained in them. For example, models retrieving passages of length 50 return their top 8 passages, and ones retrieving from D100 retrieve top 4. The figure shows this recall at 400 tokens across models and for the four document collections. Results on NQ passage recall. Left: Reranking of 200 passages; Right: Open domain retrieval result on all of (English) Wikipedia. right) shows heldout set results for the open-domain task of retrieving from Wikipedia for each of the four document collections Dl. Due to the computational cost, it was not possible to run Cross-Attention in this setting. Unlike the reranking setting, only higher-dimensional DE-BERT models outperform BM25. We also explore an efficient BM25-uni-neural model hybrid: each system retrieves 100 top scoring documents and the documents in the union of 100-best lists are scored using a linear combination of the two systems’ scores.11 The hybrid models offer large improvements over their components, capturing both precise word overlap and semantic similarity

top scoring documents: 100
left) shows test set results on reranking, where models need to select one of 200 candidate passages: 100 top results from BM25-bi and 100 random samples. It is interesting to see recall@1 recall@1. Results on the containing passage Inverse Cloze Task. Left: Reranking with BM25-uni, BM25-bi, and trained models; Right: Accuracy of retrieving passage containing a query from approximately three million candidates. BERT-based dense retrieval models compared to BM25,as maximum passage length varies. right) shows results for the much more challenging task of retrieval from three million candidates. For the latter setting, we only evaluate models that can efficiently retrieve nearest neighbors from such a large set. We see similar behavior to the reranking setting, with the multivector methods matching BM25-uni for all but the longest documents. left) shows heldout set results on the reranking task. To fairly compare systems that operate over document collections of different-sized passages, we allow each model to select approximately the same number of tokens (400) and evaluate on whether an answer is contained in them. For example, models retrieving passages of length 50 return their top 8 passages, and ones retrieving from D100 retrieve top 4. The figure shows this recall at 400 tokens across models and for the four document collections. Results on NQ passage recall. Left: Reranking of 200 passages; Right: Open domain retrieval result on all of (English) Wikipedia. right) shows heldout set results for the open-domain task of retrieving from Wikipedia for each of the four document collections Dl. Due to the computational cost, it was not possible to run Cross-Attention in this setting. Unlike the reranking setting, only higher-dimensional DE-BERT models outperform BM25. We also explore an efficient BM25-uni-neural model hybrid: each system retrieves 100 top scoring documents and the documents in the union of 100-best lists are scored using a linear combination of the two systems’ scores.11 The hybrid models offer large improvements over their components, capturing both precise word overlap and semantic similarity.

Reference
  • Dimitris Achlioptas. 2003. Database-friendly random projections: Johnson-Lindenstrauss with binary coins. Journal of computer and System Sciences, 66(4):671–687.
    Google ScholarLocate open access versionFindings
  • Alexandr Andoni, Piotr Indyk, and Ilya Razenshteyn. 2019. Approximate nearest neighbor search in high dimensions. Proceedings of the International Congress of Mathematicians (ICM 2018).
    Google ScholarLocate open access versionFindings
  • Sanjeev Arora, Mikhail Khodak, Nikunj Saunshi, and Kiran Vodrahalli. 2018. A compressed sensing view of unsupervised text embeddings, bag-of-n-grams, and lstms. In Proceedings of the International Conference on Learning Representations (ICLR).
    Google ScholarLocate open access versionFindings
  • Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer normalization. arXiv preprint arXiv:1607.06450.
    Findings
  • Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 201Neural machine translation by jointly learning to align and translate. In Proceedings of the International Conference on Learning Representations (ICLR).
    Google ScholarLocate open access versionFindings
  • Shai Ben-David, Nadav Eiron, and Hans Ulrich Simon. 2002. Limitations of learning via embeddings in euclidean half spaces. Journal of Machine Learning Research, 3(Nov):441–461.
    Google ScholarLocate open access versionFindings
  • Emmanuel J Candes and Terence Tao. 2005. Decoding by linear programming. IEEE transactions on information theory, 51(12):4203–4215.
    Google ScholarLocate open access versionFindings
  • Alexis Conneau, Douwe Kiela, Holger Schwenk, Loïc Barrault, and Antoine Bordes. 2017. Supervised learning of universal sentence representations from natural language inference data. In Proceedings of Empirical Methods in Natural Language Processing (EMNLP), pages 670– 680.
    Google ScholarLocate open access versionFindings
  • Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, and Ellen M. Voorhees. 2020. Overview of the trec 2019 deep learning track. In Text REtrieval Conference (TREC). TREC.
    Google ScholarFindings
  • Zhuyun Dai and Jamie Callan. 2020. Contextaware sentence/passage term importance estimation for first stage retrieval. Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval.
    Google ScholarLocate open access versionFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers).
    Google ScholarLocate open access versionFindings
  • Laura Dietz, Ben Gamari, Jeff Dalton, and Nick Craswell. 2018. Trec complex answer retrieval overview. In Text REtrieval Conference (TREC).
    Google ScholarFindings
  • Daniel Gillick, Sayali Kulkarni, Larry Lansing, Alessandro Presta, Jason Baldridge, Eugene Ie, and Diego Garcia-Olano. 2019. Learning dense representations for entity retrieval. In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), pages 528–537.
    Google ScholarLocate open access versionFindings
  • Ruiqi Guo, Sanjiv Kumar, Krzysztof Choromanski, and David Simcha. 2016. Quantization based fast inner product search. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), pages 482– 490.
    Google ScholarLocate open access versionFindings
  • Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Ming-Wei Chang. 2020. Realm: Retrieval-augmented language model pretraining.
    Google ScholarFindings
  • Yanchao Hao, Yuanzhe Zhang, Kang Liu, Shizhu He, Zhanyi Liu, Hua Wu, and Jun Zhao. 2017.
    Google ScholarFindings
  • An end-to-end model for question answering over knowledge base with cross-attention combining global knowledge. In Proceedings of the Association for Computational Linguistics (ACL), pages 221–231.
    Google ScholarLocate open access versionFindings
  • Sariel Har-Peled, Piotr Indyk, and Rajeev Motwani. 2012. Approximate nearest neighbor: Towards removing the curse of dimensionality. Theory of computing, 8(1):321–350.
    Google ScholarLocate open access versionFindings
  • Harold Stanley Heaps. 1978. Information retrieval, computational and theoretical aspects. Academic Press.
    Google ScholarFindings
  • Gustav Herdan. 1960. Type-token mathematics: A textbook of mathematical linguistics, volume 4. Mouton.
    Google ScholarLocate open access versionFindings
  • Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the International Conference on Information and Knowledge Management (CIKM), pages 2333–2338.
    Google ScholarLocate open access versionFindings
  • Samuel Humeau, Kurt Shuster, Marie-Anne Lachaux, and Jason Weston. 2020. Polyencoders: Transformer architectures and pretraining strategies for fast and accurate multisentence scoring. In Proceedings of the International Conference on Learning Representations (ICLR).
    Google ScholarLocate open access versionFindings
  • Piotr Indyk and Rajeev Motwani. 1998. Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of the thirtieth annual ACM symposium on Theory of computing, pages 604–613.
    Google ScholarLocate open access versionFindings
  • Vladimir Karpukhin, Barlas Oħuz, Sewon Min, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen tau Yih. 2020. Dense passage retrieval for open-domain question answering.
    Google ScholarFindings
  • Ryan Kiros, Yukun Zhu, Ruslan R Salakhutdinov, Richard Zemel, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. 2015. Skip-thought vectors. In Advances in Neural Information Processing Systems, pages 3294–3302.
    Google ScholarLocate open access versionFindings
  • Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur Parikh, Chris Alberti, Danielle Epstein, Illia Polosukhin, Matthew Kelcey, Jacob Devlin, Kenton Lee, Kristina N. Toutanova, Llion Jones, Ming-Wei Chang, Andrew Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov. 2019. Natural questions: a benchmark for question answering research. Transactions of the Association of Computational Linguistics.
    Google ScholarFindings
  • Cody Kwok, Oren Etzioni, and Daniel S Weld. 2001. Scaling question answering to the web. ACM Transactions on Information Systems (TOIS), 19(3):242–262.
    Google ScholarLocate open access versionFindings
  • Kasper Green Larsen and Jelani Nelson. 2017. Optimality of the johnson-lindenstrauss lemma. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 633–638. IEEE.
    Google ScholarLocate open access versionFindings
  • Kenton Lee, Ming-Wei Chang, and Kristina Toutanova. 2019. Latent retrieval for weakly supervised open domain question answering. In Proceedings of the Association for Computational Linguistics (ACL).
    Google ScholarLocate open access versionFindings
  • Yankai Lin, Haozhe Ji, Zhiyuan Liu, and Maosong Sun. 2018. Denoising distantly supervised open-domain question answering. In Proceedings of the Association for Computational Linguistics (ACL), pages 1736–1745.
    Google ScholarLocate open access versionFindings
  • Sewon Min, Danqi Chen, Luke Zettlemoyer, and Hannaneh Hajishirzi. 2019. Knowledge guided text retrieval and reading for open domain question answering. arXiv preprint arXiv:1911.03868.
    Findings
  • Bhaskar Mitra and Nick Craswell. 2018. An introduction to neural information retrieval. Foundations and TrendsÂoin Information Retrieval, 13(1):1–126.
    Google ScholarLocate open access versionFindings
  • Karthik Narasimhan, Adam Yala, and Regina Barzilay. 2016. Improving information extraction by acquiring external evidence with reinforcement learning. In Proceedings of Empirical Methods in Natural Language Processing (EMNLP), pages 2355–2365.
    Google ScholarLocate open access versionFindings
  • Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. 2016. Ms marco: A human generated machine reading comprehension dataset.
    Google ScholarFindings
  • Rodrigo Nogueira and Kyunghyun Cho. 2019. Passage re-ranking with BERT. CoRR, abs/1901.04085.
    Findings
  • Rodrigo Nogueira, Wei Yang, Kyunghyun Cho, and Jimmy Lin. 2019a. Multi-stage document ranking with bert.
    Google ScholarFindings
  • Rodrigo Nogueira, Wei Yang, Jimmy Lin, and Kyunghyun Cho. 2019b. Document expansion by query prediction. CoRR, abs/1904.08375.
    Findings
  • Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Proceedings of Empirical Methods in Natural Language Processing (EMNLP), pages 3982–3992.
    Google ScholarLocate open access versionFindings
  • Stephen Robertson, Hugo Zaragoza, et al. 2009. The probabilistic relevance framework: Bm25 and beyond. Foundations and Trends R in Information Retrieval, 3(4):333–389.
    Google ScholarLocate open access versionFindings
  • Minjoon Seo, Jinhyuk Lee, Tom Kwiatkowski, Ankur Parikh, Ali Farhadi, and Hannaneh Hajishirzi. 2019. Real-time open-domain question answering with dense-sparse phrase index. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4430–4441.
    Google ScholarLocate open access versionFindings
  • Karen Spärck Jones. 1972. A statistical interpretation of term specificity and its application in retrieval. Journal of documentation, 28(1):11– 21.
    Google ScholarLocate open access versionFindings
  • Nishant Subramani, Samuel Bowman, and Kyunghyun Cho. 2019. Can unconditional language models recover arbitrary sentences? In Advances in Neural Information Processing Systems, pages 15232–15242.
    Google ScholarLocate open access versionFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems, pages 5998– 6008.
    Google ScholarLocate open access versionFindings
  • Santosh S Vempala. 2004. The random projection method, volume 65. American Mathematical Society.
    Google ScholarLocate open access versionFindings
  • Ellen M. Voorhees. 2001. The TREC question answering track. Natural Language Engineering, 7(4):361âA S 378.
    Google ScholarLocate open access versionFindings
  • John Wieting and Douwe Kiela. 2019. No training required: Exploring random encoders for sentence classification. In Proceedings of the International Conference on Learning Representations (ICLR).
    Google ScholarLocate open access versionFindings
  • Ledell Wu, Fabio Petroni, Martin Josifoski, Sebastian Riedel, and Luke Zettlemoyer. 2019. Zeroshot entity linking with dense entity retrieval.
    Google ScholarFindings
  • Ming Yan, Chenliang Li, Chen Wu, Bin Bi, Wei Wang, Jiangnan Xia, and Luo Si. 2020. Idst at trec 2019 deep learning track: Deep cascade ranking with generation-based document expansion and pre-trained language modeling. In Text REtrieval Conference (TREC).
    Google ScholarFindings
  • Liu Yang, Qingyao Ai, Jiafeng Guo, and W Bruce Croft. 2016. aNMM: Ranking short answer texts with attention-based neural matching model. In Proceedings of the International Conference on Information and Knowledge Management (CIKM), pages 287–296.
    Google ScholarLocate open access versionFindings
  • Wei Yang, Haotian Zhang, and Jimmy Lin. 2019. Simple applications of BERT for ad hoc document retrieval. CoRR, abs/1903.10972.
    Findings
  • Zhilin Yang, Zihang Dai, Ruslan Salakhutdinov, and William W. Cohen. 2018. Breaking the softmax bottleneck: A high-rank rnn language model. In Proceedings of the International Conference on Learning Representations (ICLR).
    Google ScholarLocate open access versionFindings
Author
Luan Yi
Luan Yi
Collins Michael
Collins Michael
0
Your rating :

No Ratings

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn