Vector search with small radiuses
arxiv(2024)
摘要
In recent years, the dominant accuracy metric for vector search is the recall
of a result list of fixed size (top-k retrieval), considering as ground truth
the exact vector retrieval results. Although convenient to compute, this metric
is distantly related to the end-to-end accuracy of a full system that
integrates vector search. In this paper we focus on the common case where a
hard decision needs to be taken depending on the vector retrieval results, for
example, deciding whether a query image matches a database image or not. We
solve this as a range search task, where all vectors within a certain radius
from the query are returned.
We show that the value of a range search result can be modeled rigorously
based on the query-to-vector distance. This yields a metric for range search,
RSM, that is both principled and easy to compute without running an end-to-end
evaluation. We apply this metric to the case of image retrieval. We show that
indexing methods that are adapted for top-k retrieval do not necessarily
maximize the RSM. In particular, for inverted file based indexes, we show that
visiting a limited set of clusters and encoding vectors compactly yields near
optimal results.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要