Consensus–relevance kNN and covariate shift mitigation

MACHINE LEARNING(2023)

引用 0|浏览1
暂无评分
摘要
Classification and regression algorithms based on k-nearest neighbors (kNN) are often ranked among the top-10 Machine learning algorithms, due to their performance, flexibility, interpretability, non-parametric nature, and computational efficiency. Nevertheless, in existing kNN algorithms, the kNN radius, which plays a major role in the quality of kNN estimates, is independent of any weights associated with the training samples in a kNN-neighborhood. This omission, besides limiting the performance and flexibility of kNN, causes difficulties in correcting for covariate shift (e.g., selection bias) in the training data, taking advantage of unlabeled data, domain adaptation and transfer learning. We propose a new weighted kNN algorithm that, given training samples, each associated with two weights, called consensus and relevance (which may depend on the query on hand as well), and a request for an estimate of the posterior at a query, works as follows. First, it determines the kNN neighborhood as the training samples within the kth relevance-weighted order statistic of the distances of the training samples from the query. Second, it uses the training samples in this neighborhood to produce the desired estimate of the posterior (output label or value) via consensus-weighted aggregation as in existing kNN rules. Furthermore, we show that kNN algorithms are affected by covariate shift, and that the commonly used sample reweighing technique does not correct covariate shift in existing kNN algorithms. We then show how to mitigate covariate shift in kNN decision rules by using instead our proposed consensus-relevance kNN algorithm with relevance weights determined by the amount of covariate shift (e.g., the ratio of sample probability densities before and after the shift). Finally, we provide experimental results, using 197 real datasets, demonstrating that the proposed approach is slightly better (in terms of F_1 score) on average than competing benchmark approaches for mitigating selection bias, and that there are quite a few datasets for which it is significantly better.
更多
查看译文
关键词
consensus–relevance
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要