Multi-Resolution Hashing for Fast Pairwise Summations

2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS)(2018)

引用 12|浏览108
暂无评分
摘要
A basic computational primitive in the analysis of massive datasets is summing simple functions over a large number of objects. Modern applications pose an additional challenge in that such functions often depend on a parameter vector y (query) that is unknown a priori. Given a set of points X⊂ℝ^d and a pairwise function w:ℝ^d×ℝ^d→ [0,1], we study the problem of designing a data-structure that enables sublinear-time approximation of the summation Z_w(y)=1/|X|∑_x∈ Xw(x,y) for any query y∈ℝ^d. By combining ideas from Harmonic Analysis (partitions of unity and approximation theory) with Hashing-Based-Estimators [Charikar, Siminelakis FOCS'17], we provide a general framework for designing such data structures through hashing that reaches far beyond what previous techniques allowed. A key design principle is a collection of T≥ 1 hashing schemes with collision probabilities p_1,…, p_T such that sup_t∈ [T]{p_t(x,y)} = Θ(√(w(x,y))). This leads to a data-structure that approximates Z_w(y) using a sub-linear number of samples from each hash family. Using this new framework along with Distance Sensitive Hashing [Aumuller, Christiani, Pagh, Silvestri PODS'18], we show that such a collection can be constructed and evaluated efficiently for any log-convex function w(x,y)=e^ϕ(⟨ x,y⟩) of the inner product on the unit sphere x,y∈𝒮^d-1. Our method leads to data structures with sub-linear query time that significantly improve upon random sampling and can be used for Kernel Density or Partition Function Estimation. We provide extensions of our result from the sphere to ℝ^d and from scalar functions to vector functions.
更多
查看译文
关键词
Hashing,Kernel Density,Partition Function Estimation,Importance Sampling,Sub linear algorithms
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要