Provable training set debugging for linear regression

MACHINE LEARNING（2021）

引用 0|浏览10

暂无评分

摘要

We investigate problems in penalized M -estimation, inspired by applications in machine learning debugging. Data are collected from two pools, one containing data with possibly contaminated labels, and the other which is known to contain only cleanly labeled points. We first formulate a general statistical algorithm for identifying buggy points and provide rigorous theoretical guarantees when the data follow a linear model. We then propose an algorithm for tuning parameter selection of our Lasso-based algorithm with theoretical guarantees. Finally, we consider a two-person “game” played between a bug generator and a debugger, where the debugger can augment the contaminated data set with cleanly labeled versions of points in the original data pool. We develop and analyze a debugging strategy in terms of a Mixed Integer Linear Programming (MILP). Finally, we provide empirical results to verify our theoretical results and the utility of the MILP strategy.

查看译文

关键词

Robust statistics, Outlier detection, Tuning parameter selection, Optimization

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要