Symposium on Machine Learning in Speech and Language Processing (MLSLP)
Bellevue, WA, USA
Linear predictors are scale-insensitive - the prediction does not change when the weight vector defining the predictor is scaled up or down. This implies that direct regularization of the performance of a linear predictor with a scale sensitive regularizer (such as a norm of the weight vector) is meaningless. Linear predictors are typically learned by introducing a scale-sensitive surrogate loss function such as the hinge loss of an SVM. However, no convex surrogate loss function can be consistent in general - in finite dimension SVMs are not consistent. Here we generalize probit loss and ramp loss to the latent-structural setting and show that both of these loss functions are consistent in arbitrary dimension for an arbitrary bounded task loss. Empirical experience with probit loss and ramp loss will be briefly discussed.
Bibliographic reference. McAllester, David (2011): "Generalization bounds and consistency for latent-structural probit and ramp loss", In MLSLP-2011.