Consistent Weighted Sampling

msra(2010)

引用 95|浏览64
暂无评分
摘要
We describe an ecient procedure for sampling representa- tives from a weighted set such that the probability that for any weightings S and T, the probability that the two choose the same sample is the Jacard similarity: Pr(sample(S) = sample(T)) = P x min(S(x),T(x)) P x max(S(x),T(x)) . The sampling process takes expected time linear in the num- ber of non-zero weights, independent of the weights them- selves. We discuss and develop the implementation of our sam- pling schemes, reducing the requisite computation substan- tially, and reducing the randomness required to only four bits in expectation.
更多
查看译文
关键词
sampling,similarity,locality-sensitive hashing,shingling,information retrieval,web pages
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要