Robust Density Estimation From Batches: The Best Things In Life Are (Nearly) Free

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139(2021)

引用 9|浏览36
暂无评分
摘要
In many applications data are collected in batches, some potentially biased, corrupt, or even adversarial. Learning algorithms for this setting have therefore garnered considerable recent attention. In particular, a sequence of works has shown that all approximately piecewise polynomial distributions-and in particular all Gaussian, Gaussian-mixture, log-concave, low-modal, and monotone-hazard distributions-can be learned robustly in polynomial time. However, these results left open the question, stated explicitly in (Chen et al., 2020), about the best possible sample complexity of such algorithms. We answer this question, showing that, perhaps surprisingly, up to logarithmic factors, the optimal sample complexity is the same as for genuine, non-adversarial, data! To establish the result, we reduce robust learning of approximately piecewise polynomial distributions to robust learning of the probability of all subsets of size at most k of a larger discrete domain, and learn these probabilities in optimal sample complexity linear in k regardless of the domain size. In simulations, the algorithm runs very quickly and estimates distributions to essentially the accuracy achieved when all adversarial batches are removed. The results also imply the first polynomial-time sample-optimal algorithm for robust interval-based classification based on batched data.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要