Randomized Embeddings with Slack and High-Dimensional Approximate Nearest Neighbor.

ACM Transactions on Algorithms(2018)

引用 9|浏览24
暂无评分
摘要
Approximate nearest neighbor search (ε-ANN) in high dimensions has been mainly addressed by Locality Sensitive Hashing (LSH), which has complexity with polynomial dependence in dimension, sublinear query time, but subquadratic space requirement. We introduce a new “low-quality” embedding for metric spaces requiring that, for some query, there exists an approximate nearest neighbor among the pre-images of its k> 1 approximate nearest neighbors in the target space. In Euclidean spaces, we employ random projections to a dimension inversely proportional to k. Our approach extends to the decision problem with witness of checking whether there exists an approximate near neighbor; this also implies a solution for ε-ANN. After dimension reduction, we store points in a uniform grid of side length ε /&sqrt; d′, where d′ is the reduced dimension. Given a query, we explore cells intersecting the unit ball around the query. This data structure requires linear space and query time in O(d nρ), ρ ≈ 1-ε2i>/log(1ε), where n denotes input cardinality and d space dimension. Bounds are improved for doubling subsets via r-nets. We present our implementation for ε-ANN in C++ and experiments for d≤ 960, n≤ 106, using synthetic and real datasets, which confirm the theoretical analysis and, typically, yield better practical performance. We compare to FALCONN, the state-of-the-art implementation of multi-probe LSH: our prototype software is essentially comparable in terms of preprocessing, query time, and storage usage.
更多
查看译文
关键词
Approximate nearest neighbor, Johnson-Lindenstrauss Lemma, curse of dimensionality, doubling dimension, experimental study, randomized embeddings
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要