14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Confidence-Based Scoring: A Useful Diagnostic Tool for Detection Tasks

T. J. Tsai (1), Adam Janin (2)

(1) University of California at Berkeley, USA

This paper uses an unconventional analysis as a tool to diagnose the problems with three different speech activity detection systems. The unconventional analysis is to score the frames in an audio file in order of confidence, starting with the frame that we have the most confidence in and progressing towards less and less confident frames. By keeping track of the cumulative number of errors, we can determine how the errors are distributed across the data. Using speech activity detection on highly degraded audio as a case example, we show how this simple analysis can yield useful insight into both system performance and the data itself. In our case example, we use the analysis to establish three main points. First, a small percentage of the frames account for a lionfs share of the errors. Second, three different systems perform very poorly on the same small subset of data . despite the fact that the systems adopt very different decoding algorithms and features. In other words, three very different systems agree on which data is ehardf. Third, the ehardf data is primarily characterized by its proximity to speech-nonspeech boundaries. Through follow-up analyses, we show that this phenomenon is not merely an artifact of ground truth inaccuracy, but rather a steady progression of the data becoming harder and harder to classify correctly as one moves closer to the boundaries. Through this case example, we demonstrate the utility of confidence-based scoring as a general diagnostic tool for detection tasks on time-series data.

Full Paper

Bibliographic reference.  Tsai, T. J. / Janin, Adam (2013): "Confidence-based scoring: a useful diagnostic tool for detection tasks", In INTERSPEECH-2013, 737-741.