Co-Validation: Using Model Disagreement on Unlabeled Data to Validate Classification Algorithms

NIPS(2004)

引用 52|浏览17
暂无评分
摘要
In the context of binary classification, we define disagreement as a mea- sure of how often two independently-trained models differ in their clas- sification of unlabeled data. We explore the use of disagreement for error estimation and model selection. We call the procedure co-validation, since the two models effectively (in)validate one another by comparing results on unlabeled data, which we assume is relatively cheap and plen- tiful compared to labeled data. We show that per-instance disagreement is an unbiased estimate of the variance of error for that instance. We also show that disagreement provides a lower bound on the prediction (gen- eralization) error, and a tight upper bound on the "variance of prediction error", or the variance of the average error across instances, where vari- ance is measured across training sets. We present experimental results on several data sets exploring co-validation for error estimation and model selection. The procedure is especially effective in active learning set- tings, where training sets are not drawn at random and cross validation overestimates error.
更多
查看译文
关键词
prediction error,lower bound,model selection,unbiased estimator,upper bound,binary classification,active learning,cross validation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要