# Bayes consistency vs. ℌ-consistency: the interplay between surrogate loss functions and the scoring function class

NIPS'20: Proceedings of the 34th International Conference on Neural Information Processing Systems（2020）

摘要

A fundamental question in multiclass classification concerns understanding the consistency properties of surrogate risk minimization algorithms, which minimize a (often convex) surrogate to the multiclass 0-1 loss. In particular, the framework of calibrated surrogates has played an important role in analyzing Bayes consistency of such algorithms, i.e. in studying convergence to a Bayes optimal classifier (Zhang, 2004; Tewari and Bartlett, 2007). However, follow-up work has suggested this framework can be of limited value when studying ℌ-consistency ; in particular, concerns have been raised that even when the data comes from an underlying linear model, minimizing certain convex calibrated surrogates over linear scoring functions fails to recover the true model (Long and Servedio, 2013). In this paper, we investigate this apparent conundrum. We find that while some calibrated surrogates can indeed fail to provide ℌ -consistency when minimized over a natural-looking but naïvely chosen scoring function class Ƒ , the situation can potentially be remedied by minimizing them over a more carefully chosen class of scoring functions Ƒ . In particular, for the popular one-vs-all hinge and logistic surrogates, both of which are calibrated (and therefore provide Bayes consistency) under realizable models, but were previously shown to pose problems for realizable ℌ -consistency, we derive a form of scoring function class Ƒ that enables ℌ -consistency. When ℌ is the class of linear models, the class Ƒ consists of certain piecewise linear scoring functions that are characterized by the same number of parameters as in the linear case, and minimization over which can be performed using an adaptation of the min-pooling idea from neural network training. Our experiments confirm that the one-vs-all surrogates, when trained over this class of nonlinear scoring functions Ƒ , yield better linear multiclass classifiers than when trained over standard linear scoring functions.

更多查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要