ISCA Archive Interspeech 2008

Region-based vocal tract length normalization for ASR

Michail G. Maragakis, Alexandros Potamianos

In this paper, we propose a Region-based multi-parametric Vocal Tract Length Normalization (R-VTLN) algorithm for the problem of automatic speech recognition (ASR). The proposed algorithm extends the well-established mono-parametric utterance-based VTLN algorithm of Lee and Rose [1] by dividing the speech frames of a test utterance into regions and by warping independently the features corresponding to each region using a maximum likelihood criterion. We propose two algorithms for classifying frames into regions: an unsupervised clustering algorithm and an unsupervised algorithm assigning frames to regions based on phonetic-class labels obtained from a first recognition pass. We also investigate the ability of various mono-parametric and multi-parametric warping functions to reduce the spectral distance between two speakers, as a function of phone. R-VTLN is shown to significantly outperform mono-parametric VTLN in terms of word accuracy for the AURORA4 database.

