Efficient Algorithms and Lower Bounds for Robust Linear Regression.

SODA '19: Symposium on Discrete Algorithms San Diego California January, 2019(2019)

引用 154|浏览143
暂无评分
摘要
We study the prototypical problem of high-dimensional linear regression in a robust model where an ε-fraction of the samples can be adversarially corrupted. We focus on the fundamental setting where the covariates of the uncorrupted samples are drawn from a Gaussian distribution N(0, Σ) on Rd. We give nearly tight upper bounds and computational lower bounds for this problem. Specifically our main contributions are as follows: • For the case that the covariance matrix is known to be the identity we give a sample near-optimal and computationally efficient algorithm that draws Õ(d/ε2) labeled examples and outputs a candidate hypothesis vector [MATH HERE] that approximates the unknown regression vector β within ℓ2-norm O(ε log(1/ε)σ), where σ is the standard deviation of the random observation noise. An error of Ω(εσ) is information-theoretically necessary even with infinite sample size. Hence, the error guarantee of our algorithm is optimal, up to a logarithmic factor in 1/ε. Prior work gave an algorithm for this problem with sample complexity [MATH HERE] whose error guarantee scales with the ℓ2-norm of β. • For the case of unknown covariance Σ, we show that we can efficiently achieve the same error guarantee of O(ε log(1/ε)σ), as in the known covariance case, using an additional Õ(d2 / ε2) unlabeled examples. On the other hand, an error of O(εσ) can be information-theoretically attained with O(d/ε2) samples. We prove a Statistical Query (SQ) lower bound providing evidence that this quadratic tradeoff in the sample size is inherent. More specifically, we show that any polynomial time SQ learning algorithm for robust linear regression (in Huber's contamination model) with estimation complexity O(d2−c), where c > 0 is an arbitrarily small constant, must incur an error of [MATH HERE].
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要