Interspeech'2005 - Eurospeech
In this paper, we propose two confidence measures (CMs) in speech recognition: one based on acoustic likelihood and the other based on phone duration. For a decoded speech frame aligned to an HMM state, the CM based on acoustic likelihood depends on the relative position of its output likelihood value in the probability distribution of likelihood value in that particular state. The CM of whole phone is the geometric mean of CMs of all frames in it. The CM based on duration depends on the deviation of the observed duration from the expected duration of the recognized phone. The two CMs are combined using weighted geometric mean to obtain a hybrid phone CM. The hybrid CM shows significant improvement over the CM based on time normalized log-likelihood score. On TI-digits database, at 20% false acceptance rate, the normalized acoustic log-likelihood based CM has a detection rate of 83.8% while the hybrid CM has a detection rate of 92.4%.
Bibliographic reference. Pinto, Joel / Sitaram, R. N. V. (2005): "Confidence measures in speech recognition based on probability distribution of likelihoods", In INTERSPEECH-2005, 3001-3004.