DF-GAS: a Distributed FPGA-as-a-Service Architecture towards Billion-Scale Graph-based Approximate Nearest Neighbor Search

56TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, MICRO 2023(2023)

引用 0|浏览32
暂无评分
摘要
Embedding retrieval is a crucial task for recommendation systems. Graph-based approximate nearest neighbor search (GANNS) is the most commonly used method for retrieval, and achieves the best performance on billion-scale datasets. Unfortunately, the existing CPU- and GPU-based GANNS systems are difficult to optimize the throughput under the latency constraints on billion-scale datasets, due to the underutilized local memory bandwidth (5-45%) and the expensive remote data access overhead (similar to 85% of the total latency). In this paper, we first introduce a practically ideal GANNS architecture for billion-scale datasets, which facilitates a detailed analysis of the challenges and characteristics of distributed GANNS systems. Then, at the architecture level, we propose DF-GAS, a Distributed FPGA-as-a-Service (FPaaS) architecture for accelerating billionscale Graph-based Approximate nearest neighbor Search. DF-GAS uses a feature-packing memory access engine and a data prefetching and delayed processing scheme to increase local memory bandwidth by 36-42% and reduce remote data access overhead by 76.2%, respectively. At the system level, we exploit the "full-graph + sub-graph" hybrid parallel search scheme on distributed FPaaS system. It achieves million-level query-per-second with sub-millisecond latency on billion-scale GANNS for the first time. Extensive evaluations on million-scale and billion-scale datasets show that DF-GAS achieves an average of 55.4x, 32.2x, 5.4x, and 4.4x better latency-bounded throughput than CPUs, GPUs, and two state-of-the-art ANNS architectures, i.e., ANNA [23] and Vstore [27], respectively.
更多
查看译文
关键词
Embedding Retrieval,FPGA,Distributed Architecture,Graph,Approximate,Nearest Neighbor Search
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要