Testing k-Wise Independence over Streaming Data
msra(2010)
摘要
Following on the work of Indyk and McGregor (5), we consider the problem of identifying correlations in data streams. They consider a model where a stream of pairs (i,j) 2 (n)2 arrive, giving a joint distribution (X,Y ). They find approximation algorithms for how close the joint distribution is to the product of the marginal distributions under various metrics, which naturally corresponds to how close X and Y are to being independent. We extend their main result to higher dimensions, where a stream of m k-dimensional vectors in (n)k arrive, and we wish to approximate the '2 distance between the joint distribution and the product of the marginal distributions in a single pass. Our analysis gives a randomized algorithm that is a (1± ) approximation (with probability 1 ) that requires space logarithmic in n and m and proportional to 3k.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络