Testing conditional independence of discrete distributions
STOC '18: Symposium on Theory of Computing Los Angeles CA USA June, 2018(2018)
摘要
We study the problem of testing *conditional independence* for discrete distributions. Specifically, given samples from a discrete random variable ( X , Y , Z ) on domain [l 1 ]×[l 2 ] × [ n ], we want to distinguish, with probability at least 2/3, between the case that X and Y are conditionally independent given Z from the case that ( X , Y , Z ) is є-far, in l 1 -distance, from every distribution that has this property. Conditional independence is a concept of central importance in probability and statistics with important applications in various scientific domains. As such, the statistical task of testing conditional independence has been extensively studied in various forms within the statistics and econometrics community for nearly a century. Perhaps surprisingly, this problem has not been previously considered in the framework of distribution property testing and in particular no tester with *sublinear* sample complexity is known, even for the important special case that the domains of X and Y are binary. The main algorithmic result of this work is the first conditional independence tester with sublinear sample complexity. To complement our upper bound, we prove information-theoretic lower bounds establishing that the sample complexity of our algorithm is optimal, up to constant factors, for a number of settings. Specifically, for the prototypical setting when l 1 , l 2 = O (1), we show that the sample complexity of testing conditional independence (upper bound and matching lower bound) is [complex formula not displayed] We also achieve sublinear sample complexity for the general case of arbitrary l 1 , l 2 , and n . To obtain our tester, we employ a variety of tools, including (1) a subtle weighted adaptation of the ”flattening” technique, and (2) the design and analysis of an optimal (unbiased) estimator for the following statistical problem of independent interest: Given a degree- d polynomial Q ∶ℝ n → ℝ and sample access to a distribution p over [ n ], estimate Q ( p 1 , …, p n ) up to small additive error. Obtaining tight variance analyses for specific estimators of this form has been a major technical hurdle in distribution testing. As an important contribution of this work, we develop a general theory providing tight variance bounds for *all* such estimators.
更多查看译文
关键词
Distribution testing,property testing,probability distributions,conditional independence,discrete distributions,sublinear algorithms,hypothesis testing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络