In this paper, we propose a Region-based multi-parametric Vocal Tract Length Normalization (R-VTLN) algorithm for the problem of automatic speech recognition (ASR). The proposed algorithm extends the well-established mono-parametric utterance-based VTLN algorithm of Lee and Rose  by dividing the speech frames of a test utterance into regions and by warping independently the features corresponding to each region using a maximum likelihood criterion. We propose two algorithms for classifying frames into regions: an unsupervised clustering algorithm and an unsupervised algorithm assigning frames to regions based on phonetic-class labels obtained from a first recognition pass. We also investigate the ability of various mono-parametric and multi-parametric warping functions to reduce the spectral distance between two speakers, as a function of phone. R-VTLN is shown to significantly outperform mono-parametric VTLN in terms of word accuracy for the AURORA4 database.
Bibliographic reference. Maragakis, Michail G. / Potamianos, Alexandros (2008): "Region-based vocal tract length normalization for ASR", In INTERSPEECH-2008, 1365-1368.