15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Normalization of ASR Confidence Classifier Scores via Confidence Mapping

Kshitiz Kumar, Chaojun Liu, Yifan Gong

Microsoft, USA

Speech recognition confidence classifier (CC) score quantitatively represents the correctness of decoded utterances in a [0,1] range. We associate an operating threshold with the classifier and accept recognitions with scores greater than the threshold. Speech developers may set their own threshold but often an acoustic model (AM) or CC update alters the correct-accept (CA) vs. false-accept (FA) profile, necessitating a threshold reselection. This is specifically a problem when, (a) threshold is hardcoded with a shipped hardware or software, (b) developers may not have expertise for threshold tuning, (c) tuning isn't cost-effective and may need to be done often. To our knowledge, our work is the first to present this practical and interesting problem of avoiding threshold reselection and proposes novel confidence-mapping-based techniques to improve or retain both CA and FA at previously set thresholds. We propose and evaluate, (a) histogram-based mapping, (b) polynomial-fitting, (c) tanh-fitting, based methods to map confidences associated with false-recognitions and discuss their issues and benefits. In our tests, all of the above mapping methods fix the mean regression in CA of 21% to a gain to 1–2%, with tanh-mapping providing the best CA and FA tradeoff in our tests.

Full Paper

Bibliographic reference.  Kumar, Kshitiz / Liu, Chaojun / Gong, Yifan (2014): "Normalization of ASR confidence classifier scores via confidence mapping", In INTERSPEECH-2014, 1199-1203.