Accelerating massive queries of approximate nearest neighbor search on high-dimensional data

KNOWLEDGE AND INFORMATION SYSTEMS(2023)

引用 0|浏览42
暂无评分
摘要
pproximate nearest neighbor (ANN) search on high-dimensional data is a fundamental operation in many applications. In this paper, we study massive queries of ANN (MQ-ANN) search, which deals with a large number of queries simultaneously. To improve the throughput, we combine the parallel capacity of multi-core CPUs and the filtering power of the state-of-the-art index methods, i.e., proximity graphs. However, there are no solutions that exploit proximity graphs to handle MQ-ANN in parallel, except the one called query view , which simply assigns each query to a hardware thread but suffers from numerous cache misses. As the first attempt, we design efficient methods for MQ-ANN with proximity graphs and propose a novel scheduling mechanism called bridge view , which shares the same data access across multiple queries in order to reduce cache misses. Moreover, we extend our method to deal with MQ-ANN on large-scale data sets (e.g. 10^8 points). Finally, we conduct extensive experiments on real data sets to demonstrate the advantages of our method. According to our experimental results, bridge view significantly outperforms query view in various settings. In particular, bridge view with 8 hardware threads even outperforms query view with 24 hardware threads.
更多
查看译文
关键词
Massive queries,Approximate nearest neighbor search,High-dimensional data,Proximity graphs,Parallel algorithms
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络