Computationally Feasible Near-Optimal Subset Selection for Linear Regression under Measurement Constraints
arXiv: Machine Learning(2016)
摘要
Computationally feasible and statistically near-optimal subset selection strategies are derived to select a small portion of design (data) points in a linear regression model $y=Xbeta+varepsilon$ to reduce measurement cost and data efficiency. We consider two subset selection algorithms for estimating model coefficients $beta$: the first algorithm is a random subsampling based method that achieves optimal statistical performance with a small $(1+epsilon)$ relative factor under the with replacement model, and an $O(log k)$ multiplicative factor under the without replacement model, with $k$ denoting the measurement budget. The second algorithm is fully deterministic and achieves $(1+epsilon)$ relative approximation under the without replacement model, at the cost of slightly worse dependency of $k$ on the number of variables (data dimension) in the linear regression model. Finally, we show how our method could be extended to the corresponding prediction problem and also remark on interpretable sampling (selection) of data points under random design frameworks.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络