Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Frame Level Likelihood Transformations for ASR and Utterance Verification

Konstantin P. Markov, Satoshi Nakamura

ATR Spoken Language Translation Research Labs, Soraku-gun, Kyoto, Japan

In most of the current speech recognition systems based on HMM, existing decoding and utterance verification methods make use of state output likelihood as a measure of the acoustic match between the input data and the acoustic models. In this paper, we present a new and more generalized approach to the formation of the acoustic match score. The essence of this approach is to transform the likelihood of each acoustic vector with respect to any particular HMM state according to some non-linear function. We have investigated two types of such transformation functions. The first one, performs likelihood normalization, and the second one transforms likelihoods into exponentially ordered weights. The transformed likelihoods, as new acoustic scores, are used further for decoding, recognition and verification instead of the conventional likelihoods. In our evaluation experiments we used TIMIT database for phoneme recognition and verification and a database of 710 speakers and a total of 4252 distinct words, for iso- lated word recognition and verification. The results we achieved show that the transformed likelihood scores, in average, increase slightly the recognition accuracy and reduce the verification error rates up to 30%.


Full Paper

Bibliographic reference.  Markov, Konstantin P. / Nakamura, Satoshi (2000): "Frame level likelihood transformations for ASR and utterance verification", In ICSLP-2000, vol.2, 1038-1041.