AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We develop Asymmetric LSH, which generalizes the existing Locality Sensitive Hashing framework by applying asymmetric transformations to the input query vector and the data vectors in the repository

Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS).

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), (2014): 2321-2329

Cited by: 375|Views398
EI
Full Text
Bibtex
Weibo

Abstract

We present the first provably sublinear time hashing algorithm for approximate Maximum Inner Product Search (MIPS). Searching with (un-normalized) inner product as the underlying similarity measure is a known difficult problem and finding hashing schemes for MIPS was considered hard. While the existing Locality Sensitive Hashing (LSH) fra...More

Code:

Data:

0
Introduction
  • Introduction and Motivation

    The focus of this paper is on the problem of Maximum Inner Product Search (MIPS).
  • The authors show that it is possible to relax the current LSH framework to allow asymmetric hash functions which can efficiently solve MIPS.
  • The authors' construction of asymmetric LSH is based on an interesting fact that the original MIPS problem, after asymmetric transformations, reduces to the problem of approximate near neighbor search in classical settings.
Highlights
  • Introduction and Motivation

    The focus of this paper is on the problem of Maximum Inner Product Search (MIPS)
  • Popular techniques for c-NN are often based on Locality Sensitive Hashing (LSH) [12], which is a family of functions with the nice property that more similar objects in the domain of these functions have a higher probability of colliding in the range space than less similar ones
  • This paper focuses on using L2LSH to convert near neighbor search of L2 distance into an Asymmetric LSH (ALSH) (i.e., L2-ALSH) for MIPS
  • We develop ALSH, which generalizes the existing LSH framework by applying asymmetric transformations to the input query vector and the data vectors in the repository
  • We present an implementation of ALSH by proposing a novel transformation which converts the original inner products into L2 distances in the transformed space
  • Both theoretically and empirically, that this implementation of ALSH provides provably efficient as well as practical solution to MIPS
Results
  • Based on this key observation, the authors provide an example of explicit construction of asymmetric hash function, leading to the first provably sublinear query time hashing algorithm for approximate similarity search with inner product as the similarity.
  • In Section 3.3, the authors explicitly show a construction of asymmetric locality sensitive hash function for solving MIPS.
  • Theorem 2 Given a family of hash function H and the associated query and preprocessing transformations P and Q, which is (S0, cS0, p1, p2) -sensitive, one can construct a data structure for c-NN
  • Applying Theorem 2, the authors can construct data structures with worst case O(nρ log n) query time guarantees for c-approximate MIPS, where log Fr 1 + m 4 − 2S0 + U 2m+1 ρ=
  • Theorem 4 (Approximate MIPS is Efficient) For the problem of c-approximate MIPS with q 2 = 1, one can construct a data structure having O(nρ∗ log n) query time and space O(n1+ρ∗ ), where ρ∗ < 1 is the solution to constraint optimization (14).
  • Just like in the typical LSH framework, the value of ρ∗ in Theorem 4 depends on the c-approximate instance the authors aim to solve, which requires knowing the similarity threshold S0 and the approximation ratio c.
  • Theorem 5 (Unconditional Approximate MIPS is Efficient) For the problem of c-approximate MIPS in a bounded space, one can construct a data structure having O(nρ∗u log n) query time and space O(n1+ρ∗u ), where ρ∗u < 1 is the solution to constraint optimization (14).
  • The authors evaluate the proposed ALSH scheme for the MIPS problem on two popular collaborative filtering datasets on the task of item recommendations: (i) Movielens(10M), and (ii) Netflix.
  • Given a user i and its corresponding user vector ui, the authors compute the top-10 gold standard items based on the actual inner products uTi vj, ∀j.
Conclusion
  • The authors implemented the standard (K, L)-parameterized bucketing algorithm [1] for retrieving top-50 items based on PureSVD procedure using the proposed ALSH hash function and the two baselines: SRP and L2LSH.
  • The authors develop ALSH, which generalizes the existing LSH framework by applying asymmetric transformations to the input query vector and the data vectors in the repository.
  • Other explicit constructions of ALSH, for example, ALSH through cosine similarity, or ALSH through resemblance, will be presented in followup technical reports
Funding
  • The research is partially supported by NSF-DMS-1444124, NSF-III-1360971, NSF-Bigdata1419210, ONR-N00014-13-1-0764, and AFOSR-FA9550-13-1-0137
Reference
  • A. Andoni and P. Indyk. E2lsh: Exact euclidean locality sensitive hashing. Technical report, 2004.
    Google ScholarFindings
  • A. Z. Broder. On the resemblance and containment of documents. In the Compression and Complexity of Sequences, pages 21–29, Positano, Italy, 1997.
    Google ScholarLocate open access versionFindings
  • M. S. Charikar. Similarity estimation techniques from rounding algorithms. In STOC, pages 380–388, Montreal, Quebec, Canada, 2002.
    Google ScholarLocate open access versionFindings
  • P. Cremonesi, Y. Koren, and R. Turrin. Performance of recommender algorithms on topn recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems, pages 39–46. ACM, 2010.
    Google ScholarLocate open access versionFindings
  • R. R. Curtin, A. G. Gray, and P. Ram. Fast exact max-kernel search. In SDM, pages 1–9, 2013.
    Google ScholarLocate open access versionFindings
  • M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokn. Locality-sensitive hashing scheme based on p-stable distributions. In SCG, pages 253 – 262, Brooklyn, NY, 2004.
    Google ScholarLocate open access versionFindings
  • T. Dean, M. A. Ruzon, M. Segal, J. Shlens, S. Vijayanarasimhan, and J. Yagnik. Fast, accurate detection of 100,000 object classes on a single machine. In Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, pages 1814–1821. IEEE, 2013.
    Google ScholarLocate open access versionFindings
  • P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part-based models. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 32(9):1627–1645, 2010.
    Google ScholarLocate open access versionFindings
  • J. H. Friedman and J. W. Tukey. A projection pursuit algorithm for exploratory data analysis. IEEE Transactions on Computers, 23(9):881–890, 1974.
    Google ScholarLocate open access versionFindings
  • M. X. Goemans and D. P. Williamson. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. Journal of ACM, 42(6):1115– 1145, 1995.
    Google ScholarLocate open access versionFindings
  • S. Har-Peled, P. Indyk, and R. Motwani. Approximate nearest neighbor: Towards removing the curse of dimensionality. Theory of Computing, 8(14):321–350, 2012.
    Google ScholarLocate open access versionFindings
  • P. Indyk and R. Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In STOC, pages 604–613, Dallas, TX, 1998.
    Google ScholarLocate open access versionFindings
  • N. Koenigstein, P. Ram, and Y. Shavitt. Efficient retrieval of recommendations in a matrix factorization framework. In CIKM, pages 535–544, 2012.
    Google ScholarLocate open access versionFindings
  • Y. Koren, R. Bell, and C. Volinsky. Matrix factorization techniques for recommender systems.
    Google ScholarFindings
  • P. Li and A. C. Konig. Theory and applications b-bit minwise hashing. Commun. ACM, 2011.
    Google ScholarLocate open access versionFindings
  • P. Li, M. Mitzenmacher, and A. Shrivastava. Coding for random projections. In ICML, 2014.
    Google ScholarLocate open access versionFindings
  • P. Li, M. Mitzenmacher, and A. Shrivastava. Coding for random projections and approximate near neighbor search. Technical report, arXiv:1403.8144, 2014.
    Findings
  • B. Neyshabur and N. Srebro. A simpler and better lsh for maximum inner product search (mips). Technical report, arXiv:1410.5518, 2014.
    Findings
  • P. Ram and A. G. Gray. Maximum inner-product search using cone trees. In KDD, pages 931–939, 2012.
    Google ScholarLocate open access versionFindings
  • A. Shrivastava and P. Li. Beyond pairwise: Provably fast algorithms for approximate k-way similarity search. In NIPS, Lake Tahoe, NV, 2013.
    Google ScholarLocate open access versionFindings
  • A. Shrivastava and P. Li. Asymmetric minwise hashing. Technical report, 2014.
    Google ScholarFindings
  • A. Shrivastava and P. Li. An improved scheme for asymmetric lsh. Technical report, arXiv:1410.5410, 2014.
    Findings
  • A. Shrivastava and P. Li. In defense of minhash over simhash. In AISTATS, 2014.
    Google ScholarLocate open access versionFindings
  • R. Weber, H.-J. Schek, and S. Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In VLDB, pages 194–205, 1998.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科