Flexible IR Pipelines with Capreolus
CIKM '20: The 29th ACM International Conference on Information and Knowledge Management Virtual Event Ireland October, 2020(2020)
摘要
While a number of recent open-source toolkits for training and using neural information retrieval models have greatly simplified experiments with neural reranking methods, they essentially hard code a "search-then-rerank'' experimental pipeline. These pipelines consist of an efficient first-stage ranking method, like BM25, followed by a neural reranking method. Deviations from this setup often require hacks; some improvements, like adding a second reranking step that uses a more expensive neural method, are infeasible without major code changes. In order to improve the flexibility of such toolkits, we propose implementing experimental pipelines as dependency graphs of functional "IR primitives,'' which we call modules, that can be used and combined as needed.
For example, a neural IR pipeline may rerank results from a Searcher module that efficiently retrieves results from an Index module that it depends on. In turn, the Index depends on a Collection to index, which is provided by the pipeline. This Searcher module is self-contained: the pipeline does not need to know about or interact with the Index of the Searcher, which is transparently shared among Searcher modules when possible (e.g., a BM25 and a QL Searcher might share the same Index). Similarly, a Reranker module might depend on a Trainer (e.g., Tensorflow), feature Extractor, Tokenizer, etc. In both cases, the pipeline needs to interact only with the Reranker or Searcher directly; the complexity of their dependencies is hidden and intelligently managed. We rewrite the Capreolus toolkit to take this approach and demonstrate its use. %in a series of code examples and experiments.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络