Odyssey 2012 - The Speaker and Language Recognition Workshop
It is obvious how to evaluate the goodness of a pattern classifier that outputs hard classification decisions - you count the errors. But hard classification decisions are implicitly dependent on fixed priors and costs, so that they are applicable only in a narrow range of applications. A classifier can widen its range of applicability by outputting instead soft decisions, in the form of class probabilities or likelihoods. However, it is much less obvious how to evaluate the goodness of such probabilistic outputs. To evaluate the goodness of recognized classes, they can simply be compared to the true class labels in a supervised evaluation database. But we simply don't have a similar truth reference for probabilistic outputs.
A solution to this problem, originally from weather prediction, called "proper scoring rules", has been known for several decades, but has enjoyed only limited attention in pattern recognition and machine learning. This talk will explain how they work, how they generalize error-rate, how they measure information and how to use them for both training and evaluation of probabilistic pattern recognizers.
Bibliographic reference. Brümmer, Niko (2012): "The role of proper scoring rules in training and evaluating probabilistic speaker and language recognizers", In Odyssey-2012 (abstract).