Within-utterance correlation for speech recognition

Mats Blomberg

Relations between non-adjacent parts of an utterance are commonly regarded as an important source of information for speech recognition. However, they have not been very much used in speech recognition systems. In this paper, we include this information by joint distributions of pairs of phones occurring in the same utterance. In addition to relations between acoustic events, we also have incorporated relations between spectral and prosodically oriented information, such as phone duration, position in utterance and funda-mental frequency. Preliminary recognition results on N-best rescoring show 10% word error reduction compared to a baseline Viterbi decoder.

