Reproducibility in Learning

PROCEEDINGS OF THE 54TH ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING (STOC '22)(2022)

引用 30|浏览36
暂无评分
摘要
We introduce the notion of a reproducible algorithm in the context of learning. A reproducible learning algorithm is resilient to variations in its samples - with high probability, it returns the exact same output when run on two samples from the same underlying distribution. We begin by unpacking the definition, clarifying how randomness is instrumental in balancing accuracy and reproducibility. We initiate a theory of reproducible algorithms, showing how reproducibility implies desirable properties such as data reuse and efficient testability. Despite the exceedingly strong demand of reproducibility, there are efficient reproducible algorithms for several fundamental problems in statistics and learning. First, we show that any statistical query algorithm can be made reproducible with a modest increase in sample complexity, and we use this to construct reproducible algorithms for finding approximate heavy-hitters and medians. Using these ideas, we give the first reproducible algorithm for learning halfspaces via a reproducible weak learner and a reproducible boosting algorithm. Interestingly, we utilize a connection to foams as a higher-dimension randomized rounding scheme. Finally, we initiate the study of lower bounds and inherent tradeoffs for reproducible algorithms, giving nearly tight sample complexity upper and lower bounds for reproducible versus nonreproducible SQ algorithms.
更多
查看译文
关键词
Reproducibility, Replicability
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要