## AI helps you reading Science

## AI Insight

AI extracts a summary of this paper

Weibo:

# Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS).

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), (2014): 2321-2329

EI

Keywords

Abstract

We present the first provably sublinear time hashing algorithm for approximate Maximum Inner Product Search (MIPS). Searching with (un-normalized) inner product as the underlying similarity measure is a known difficult problem and finding hashing schemes for MIPS was considered hard. While the existing Locality Sensitive Hashing (LSH) fra...More

Code:

Data:

Introduction

**Introduction and Motivation**

The focus of this paper is on the problem of Maximum Inner Product Search (MIPS).- The authors show that it is possible to relax the current LSH framework to allow asymmetric hash functions which can efficiently solve MIPS.
- The authors' construction of asymmetric LSH is based on an interesting fact that the original MIPS problem, after asymmetric transformations, reduces to the problem of approximate near neighbor search in classical settings.

Highlights

**Introduction and Motivation**

The focus of this paper is on the problem of Maximum Inner Product Search (MIPS)- Popular techniques for c-NN are often based on Locality Sensitive Hashing (LSH) [12], which is a family of functions with the nice property that more similar objects in the domain of these functions have a higher probability of colliding in the range space than less similar ones
- This paper focuses on using L2LSH to convert near neighbor search of L2 distance into an Asymmetric LSH (ALSH) (i.e., L2-ALSH) for MIPS
- We develop ALSH, which generalizes the existing LSH framework by applying asymmetric transformations to the input query vector and the data vectors in the repository
- We present an implementation of ALSH by proposing a novel transformation which converts the original inner products into L2 distances in the transformed space
- Both theoretically and empirically, that this implementation of ALSH provides provably efficient as well as practical solution to MIPS

Results

- Based on this key observation, the authors provide an example of explicit construction of asymmetric hash function, leading to the first provably sublinear query time hashing algorithm for approximate similarity search with inner product as the similarity.
- In Section 3.3, the authors explicitly show a construction of asymmetric locality sensitive hash function for solving MIPS.
- Theorem 2 Given a family of hash function H and the associated query and preprocessing transformations P and Q, which is (S0, cS0, p1, p2) -sensitive, one can construct a data structure for c-NN
- Applying Theorem 2, the authors can construct data structures with worst case O(nρ log n) query time guarantees for c-approximate MIPS, where log Fr 1 + m 4 − 2S0 + U 2m+1 ρ=
- Theorem 4 (Approximate MIPS is Efficient) For the problem of c-approximate MIPS with q 2 = 1, one can construct a data structure having O(nρ∗ log n) query time and space O(n1+ρ∗ ), where ρ∗ < 1 is the solution to constraint optimization (14).
- Just like in the typical LSH framework, the value of ρ∗ in Theorem 4 depends on the c-approximate instance the authors aim to solve, which requires knowing the similarity threshold S0 and the approximation ratio c.
- Theorem 5 (Unconditional Approximate MIPS is Efficient) For the problem of c-approximate MIPS in a bounded space, one can construct a data structure having O(nρ∗u log n) query time and space O(n1+ρ∗u ), where ρ∗u < 1 is the solution to constraint optimization (14).
- The authors evaluate the proposed ALSH scheme for the MIPS problem on two popular collaborative filtering datasets on the task of item recommendations: (i) Movielens(10M), and (ii) Netflix.
- Given a user i and its corresponding user vector ui, the authors compute the top-10 gold standard items based on the actual inner products uTi vj, ∀j.

Conclusion

- The authors implemented the standard (K, L)-parameterized bucketing algorithm [1] for retrieving top-50 items based on PureSVD procedure using the proposed ALSH hash function and the two baselines: SRP and L2LSH.
- The authors develop ALSH, which generalizes the existing LSH framework by applying asymmetric transformations to the input query vector and the data vectors in the repository.
- Other explicit constructions of ALSH, for example, ALSH through cosine similarity, or ALSH through resemblance, will be presented in followup technical reports

Funding

- The research is partially supported by NSF-DMS-1444124, NSF-III-1360971, NSF-Bigdata1419210, ONR-N00014-13-1-0764, and AFOSR-FA9550-13-1-0137

Reference

- A. Andoni and P. Indyk. E2lsh: Exact euclidean locality sensitive hashing. Technical report, 2004.
- A. Z. Broder. On the resemblance and containment of documents. In the Compression and Complexity of Sequences, pages 21–29, Positano, Italy, 1997.
- M. S. Charikar. Similarity estimation techniques from rounding algorithms. In STOC, pages 380–388, Montreal, Quebec, Canada, 2002.
- P. Cremonesi, Y. Koren, and R. Turrin. Performance of recommender algorithms on topn recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems, pages 39–46. ACM, 2010.
- R. R. Curtin, A. G. Gray, and P. Ram. Fast exact max-kernel search. In SDM, pages 1–9, 2013.
- M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokn. Locality-sensitive hashing scheme based on p-stable distributions. In SCG, pages 253 – 262, Brooklyn, NY, 2004.
- T. Dean, M. A. Ruzon, M. Segal, J. Shlens, S. Vijayanarasimhan, and J. Yagnik. Fast, accurate detection of 100,000 object classes on a single machine. In Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, pages 1814–1821. IEEE, 2013.
- P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part-based models. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 32(9):1627–1645, 2010.
- J. H. Friedman and J. W. Tukey. A projection pursuit algorithm for exploratory data analysis. IEEE Transactions on Computers, 23(9):881–890, 1974.
- M. X. Goemans and D. P. Williamson. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. Journal of ACM, 42(6):1115– 1145, 1995.
- S. Har-Peled, P. Indyk, and R. Motwani. Approximate nearest neighbor: Towards removing the curse of dimensionality. Theory of Computing, 8(14):321–350, 2012.
- P. Indyk and R. Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In STOC, pages 604–613, Dallas, TX, 1998.
- N. Koenigstein, P. Ram, and Y. Shavitt. Efficient retrieval of recommendations in a matrix factorization framework. In CIKM, pages 535–544, 2012.
- Y. Koren, R. Bell, and C. Volinsky. Matrix factorization techniques for recommender systems.
- P. Li and A. C. Konig. Theory and applications b-bit minwise hashing. Commun. ACM, 2011.
- P. Li, M. Mitzenmacher, and A. Shrivastava. Coding for random projections. In ICML, 2014.
- P. Li, M. Mitzenmacher, and A. Shrivastava. Coding for random projections and approximate near neighbor search. Technical report, arXiv:1403.8144, 2014.
- B. Neyshabur and N. Srebro. A simpler and better lsh for maximum inner product search (mips). Technical report, arXiv:1410.5518, 2014.
- P. Ram and A. G. Gray. Maximum inner-product search using cone trees. In KDD, pages 931–939, 2012.
- A. Shrivastava and P. Li. Beyond pairwise: Provably fast algorithms for approximate k-way similarity search. In NIPS, Lake Tahoe, NV, 2013.
- A. Shrivastava and P. Li. Asymmetric minwise hashing. Technical report, 2014.
- A. Shrivastava and P. Li. An improved scheme for asymmetric lsh. Technical report, arXiv:1410.5410, 2014.
- A. Shrivastava and P. Li. In defense of minhash over simhash. In AISTATS, 2014.
- R. Weber, H.-J. Schek, and S. Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In VLDB, pages 194–205, 1998.

Tags

Comments

数据免责声明

页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果，我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问，可以通过电子邮件方式联系我们：report@aminer.cn