Vocal tract length normalization (VTLN) is commonly applied utterance-wise with a warping function which makes the assumption of a linear dependence between the vocal tract length and the location of the formants. In this work we propose a data-driven method for enhancing the performance of systems that already use standard VTLN. The method is based on elastic registration to estimate optimal nonparametric transformations to further reduce inter-speaker variabilities. Results show that the proposed method can increase the performance of monophone systems such that it reaches that of a triphone system.
Index Terms: automatic speech recognition, vocal tract length normalization, elastic registration
Bibliographic reference. Müller, Florian / Mertins, Alfred (2012): "Enhancing vocal tract length normalization with elastic registration for automatic speech recognition", In INTERSPEECH-2012, 1364-1367.