Speech Recognition and Intrinsic Variation (SRIV2006)

Toulouse, France
May 20, 2006

Multiple Acoustic and Variability Estimation Models for ASR

Stéphane Dupont, Christophe Ris

Multitel, Mons, Belgium

In the paper, we expose a formalism that allows to make use of features representing both short-term and long-term speech behavior. This amounts to using multiple (specific, compensated or adapted) acoustic models which are defined according to additional hidden variables not pertaining to the phonetic sequence, but rather to long-term stable structures in the speech signal, like the speaker identity or the speaking rate.

This formalism has been evaluated for recognition using vocal tract length (VTL) normalization. Features based on long-term pitch and formant measures, as well as PCA reductions of these, have been investigated nd show significant correlation with the VTL. Speech recognition experiments performed on the children portion of the TI-DIGITS database show the improved accuracy obtained using this technique compared to VTL selection based on the traditional Maximum Likelihood criterion.

Full Paper

Bibliographic reference.  Dupont, Stéphane / Ris, Christophe (2006): "Multiple acoustic and variability estimation models for ASR", In SRIV-2006, 107-112.