A Theoretical and Practical Framework for Regression and Classification from Truncated Samples

AISTATS(2020)

引用 10|浏览72
暂无评分
摘要
Machine learning and statistics are invaluable for extracting insights from data. A key assumption of most methods, however, is that they have access to independent samples from the distribution of relevant data. As such, these methods often perform poorly in the face of biased data which breaks this assumption. In this work, we consider the classical challenge of bias due to truncation, wherein samples falling outside of an "observation window" cannot be observed. We present a general framework for regression and classification from samples that are truncated according to the value of the dependent variable. The framework argues that stochastic gradient descent (SGD) can be efficiently executed on the population log-likelihood of the truncated sample. Our framework is broadly applicable, and we provide end-to-end guarantees for the well-studied problems of truncated logistic and probit regression, where we argue that the true model parameters can be identified computationally and statistically efficiently from truncated data, extending recent work on truncated linear regression. We also provide experiments to illustrate the practicality of our framework on synthetic and real data.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要