14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

VTLN Based on the Linear Interpolation of Contiguous Mel Filter-Bank Energies

Néstor Becerra Yoma (1), Claudio Garretón (1), Fernando Huenupán (2), Ignacio Catalán (1), Jorge Wuth (1)

(1) Universidad de Chile, Chile
(2) Universidad de La Frontera, Chile

This paper describes a novel feature-space VTLN method that models frequency warping as a linear interpolation of contiguous Mel filter-bank energies. The presented technique aims to reduce the distortion in the Mel filter-bank energy estimation due to the harmonic composition of voiced speech intervals and DFT sampling when the central frequency of band-pass filters is shifted. The presented interpolated filter-bank energy-based VTLN leads to relative reductions in WER as high as 11.2% and 7.6% when compared with the baseline system and standard VTLN, respectively, in a medium-vocabulary continuous speech recognition task. Also, this new scheme provides significant reductions in WER equal to 7% when compared with state-of-the-art VTLN methods based on linear transforms in the cepstral space. The warping factor estimated here shows more dependence on the speaker and more independence of the acoustic-phonetic content than the warping factor in state-of-the-art VTLN techniques.

