13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Enhancing Vocal Tract Length Normalization with Elastic Registration for Automatic Speech Recognition

Florian Müller, Alfred Mertins

Institute for Signal Processing, University of Lübeck, Lübeck, Germany

Vocal tract length normalization (VTLN) is commonly applied utterance-wise with a warping function which makes the assumption of a linear dependence between the vocal tract length and the location of the formants. In this work we propose a data-driven method for enhancing the performance of systems that already use standard VTLN. The method is based on elastic registration to estimate optimal nonparametric transformations to further reduce inter-speaker variabilities. Results show that the proposed method can increase the performance of monophone systems such that it reaches that of a triphone system.

Index Terms: automatic speech recognition, vocal tract length normalization, elastic registration

Bibliographic reference.  Müller, Florian / Mertins, Alfred (2012): "Enhancing vocal tract length normalization with elastic registration for automatic speech recognition", In INTERSPEECH-2012, 1364-1367.