CLAM-Accelerated K-Nearest Neighbors Entropy-Scaling Search of Large High-Dimensional Datasets via an Actualization of the Manifold Hypothesis

Morgan E. Prior,Thomas J. Howard III, Oliver McLaughlin,Najib Ishaq,Noah M. Daniels

CoRR(2023)

引用 0|浏览4
暂无评分
摘要
Many fields are experiencing a Big Data explosion, with data collection rates outpacing the rate of computing performance improvements predicted by Moore's Law. Researchers are often interested in similarity search on such data. We present CAKES (CLAM-Accelerated $K$-NN Entropy Scaling Search), a novel algorithm for $k$-nearest-neighbor ($k$-NN) search which leverages geometric and topological properties inherent in large datasets. CAKES assumes the manifold hypothesis and performs best when data occupy a low dimensional manifold, even if the data occupy a very high dimensional embedding space. We demonstrate performance improvements ranging from hundreds to tens of thousands of times faster when compared to state-of-the-art approaches such as FAISS and HNSW, when benchmarked on 5 standard datasets. Unlike locality-sensitive hashing approaches, CAKES can work with any user-defined distance function. When data occupy a metric space, CAKES exhibits perfect recall.
更多
查看译文
关键词
datasets,manifold hypothesis,clam-accelerated,k-nearest,entropy-scaling,high-dimensional
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要