7th International Conference on Spoken Language Processing
September 16-20, 2002
In HMM-based recognition systems for large vocabulary, the observation likelihoods provided by the acoustic models are useful in con- fidence measures if they are properly normalised. This paper compares two normalisation methods for the acoustic model likelihoods: unconstrained normalisation, based on the unconditional observation likelihood, and constrained normalisation, based on the observation likelihoods in a phoneme recognition system in which the phoneme strings are constrained by an N-gram phoneme sequence model. We found on the benchmark 20k word Wall Street Journal recognition task that both normalisations perform equally well at first sight. However their behaviour depends on the length of the word: constrained normalisation outperforms unconstrained normalisation for long words and the opposite holds for short words. With a confidence measure that exploits this fact, the normalised cross entropy metric for confidence measures can be increased from a reference 21.9% (with unconstrained normalisation) to 23.5%.
Bibliographic reference. Duchateau, Jacques / Wambacq, Patrick (2002): "Unconstrained versus constrained acoustic normalisation in confidence scoring", In ICSLP-2002, 1617-1620.