The Full Landscape of Robust Mean Testing: Sharp Separations between Oblivious and Adaptive Contamination

2023 IEEE 64TH ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, FOCS(2023)

引用 0|浏览19
暂无评分
摘要
We consider the question of Gaussian mean testing, a fundamental task in high-dimensional distribution testing and signal processing, subject to adversarial corruptions of the samples. We focus on the relative power of different adversaries, and show that, in contrast to the common wisdom in robust statistics, there exists a strict separation between adaptive adversaries (strong contamination) and oblivious ones (weak contamination) for this task. Specifically, we resolve both the information-theoretic and computational landscapes for robust mean testing. In the exponential-time setting, we establish the tight sample complexity of testing N(0, I) against N(alpha v, I), where ||v||(2) = 1, with an epsilon-fraction of adversarial corruptions, to be (Theta) over tilde (max(root d/alpha(2), d epsilon(3)/alpha(4), min(d(2/3)epsilon(2/3)/alpha(8/3), d epsilon/alpha(2)))), while the complexity against adaptive adversaries is (Theta) over tilde (max(root d/alpha(2), d epsilon(3)/alpha(4))), which is strictly worse for a large range of vanishing e, a. To the best of our knowledge, ours is the first separation in sample complexity between the strong and weak contamination models. In the polynomial-time setting, we close a gap in the literature by providing a polynomial-time algorithm against adaptive adversaries achieving the above sample complexity (Theta) over tilde (max(root d/alpha(2), d epsilon(3)/alpha(4))), and a lowdegree lower bound (which complements an existing reduction from planted clique) suggesting that all efficient algorithms require this many samples, even in the oblivious-adversary setting.
更多
查看译文
关键词
Property testing,robust statistics,identity testing,corrupted data,data poisoning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要