Robust and efficient locality sensitive hashing for nearest neighbor search in large data sets

NIPS Workshop on Big Learning (BigLearn)(2012)

引用 8|浏览4
暂无评分
摘要
Locality sensitive hashing (LSH) has been used extensively as a basis for many data retrieval applications. However, previous approaches, such as random projection and multi-probe hashing, may exhibit high query complexity of up to Θ (n) when the underlying data distribution is highly skewed. This is due to the imbalance in the number of data stored per each bucket, which leads to slow query time in large data sets. In this paper, we introduce a distribution-free LSH algorithm that addresses this problem by maintaining nearly uniform number of points per bucket. As a consequence, our algorithm allows one to reduce the number of hash tables, and is hence memory-efficient, while achieving high accuracy. Through extensive experiments, we show that our algorithm accurately retrieves nearest neighbors faster than other standard LSH algorithms do in large data sets, and maintains nearly uniform number of per-bucket points.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要