Domain constraints improve risk prediction when outcome data is missing
ICLR 2024(2023)
摘要
Machine learning models are often trained to predict the outcome resulting
from a human decision. For example, if a doctor decides to test a patient for
disease, will the patient test positive? A challenge is that the human decision
censors the outcome data: we only observe test outcomes for patients doctors
historically tested. Untested patients, for whom outcomes are unobserved, may
differ from tested patients along observed and unobserved dimensions. We
propose a Bayesian model class which captures this setting. The purpose of the
model is to accurately estimate risk for both tested and untested patients.
Estimating this model is challenging due to the wide range of possibilities for
untested patients. To address this, we propose two domain constraints which are
plausible in health settings: a prevalence constraint, where the overall
disease prevalence is known, and an expertise constraint, where the human
decision-maker deviates from purely risk-based decision-making only along a
constrained feature set. We show theoretically and on synthetic data that
domain constraints improve parameter inference. We apply our model to a case
study of cancer risk prediction, showing that the model's inferred risk
predicts cancer diagnoses, its inferred testing policy captures known public
health policies, and it can identify suboptimalities in test allocation. Though
our case study is in healthcare, our analysis reveals a general class of domain
constraints which can improve model estimation in many settings.
更多查看译文
关键词
Bayesian model,health,selective labels,distribution shift,domain constraint,biomedicine
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要