9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Region-Based Vocal Tract Length Normalization for ASR

Michail G. Maragakis, Alexandros Potamianos

Technical University of Crete, Greece

In this paper, we propose a Region-based multi-parametric Vocal Tract Length Normalization (R-VTLN) algorithm for the problem of automatic speech recognition (ASR). The proposed algorithm extends the well-established mono-parametric utterance-based VTLN algorithm of Lee and Rose [1] by dividing the speech frames of a test utterance into regions and by warping independently the features corresponding to each region using a maximum likelihood criterion. We propose two algorithms for classifying frames into regions: an unsupervised clustering algorithm and an unsupervised algorithm assigning frames to regions based on phonetic-class labels obtained from a first recognition pass. We also investigate the ability of various mono-parametric and multi-parametric warping functions to reduce the spectral distance between two speakers, as a function of phone. R-VTLN is shown to significantly outperform mono-parametric VTLN in terms of word accuracy for the AURORA4 database.

Full Paper

Bibliographic reference.  Maragakis, Michail G. / Potamianos, Alexandros (2008): "Region-based vocal tract length normalization for ASR", In INTERSPEECH-2008, 1365-1368.