ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

VTLN based on the linear interpolation of contiguous mel filter-bank energies

Néstor Becerra Yoma, Claudio Garretón, Fernando Huenupán, Ignacio Catalán, Jorge Wuth

This paper describes a novel feature-space VTLN method that models frequency warping as a linear interpolation of contiguous Mel filter-bank energies. The presented technique aims to reduce the distortion in the Mel filter-bank energy estimation due to the harmonic composition of voiced speech intervals and DFT sampling when the central frequency of band-pass filters is shifted. The presented interpolated filter-bank energy-based VTLN leads to relative reductions in WER as high as 11.2% and 7.6% when compared with the baseline system and standard VTLN, respectively, in a medium-vocabulary continuous speech recognition task. Also, this new scheme provides significant reductions in WER equal to 7% when compared with state-of-the-art VTLN methods based on linear transforms in the cepstral space. The warping factor estimated here shows more dependence on the speaker and more independence of the acoustic-phonetic content than the warping factor in state-of-the-art VTLN techniques.


doi: 10.21437/Interspeech.2013-738

Cite as: Yoma, N.B., Garretón, C., Huenupán, F., Catalán, I., Wuth, J. (2013) VTLN based on the linear interpolation of contiguous mel filter-bank energies. Proc. Interspeech 2013, 3337-3341, doi: 10.21437/Interspeech.2013-738

@inproceedings{yoma13_interspeech,
  author={Néstor Becerra Yoma and Claudio Garretón and Fernando Huenupán and Ignacio Catalán and Jorge Wuth},
  title={{VTLN based on the linear interpolation of contiguous mel filter-bank energies}},
  year=2013,
  booktitle={Proc. Interspeech 2013},
  pages={3337--3341},
  doi={10.21437/Interspeech.2013-738}
}