ISCA Archive Interspeech 2011
ISCA Archive Interspeech 2011

VTLN in the MFCC domain: band-limited versus local interpolation

Ehsan Variani, Thomas Schaaf

We propose a new easy-to-implement method to compute a Linear Transform (LT) to perform Vocal Tract Length Normalization (VTLN) on truncated Mel Frequency Cepstral Coefficients (MFCCs) normally used in distributed speech recognition. The method is based on a Local Interpolation which is independent of the Mel filter design. Local Interpolation (LILT) VTLN is theoretically and experimentally compared to a global scheme based on band-limited interpolation (BLI-VTLN) and the conventional frequency warping scheme (FFT-VTLN). Investigating the interoperability of these methods shows that the performance of LILT-VTLN is on par with FFT-VTLN and BLI-VTLN. The statistical significance test also shows that there are no significant differences between FFT-VTLN, LILT-VTLN, and BLI-VTLN, even if the models and front ends do not match.

doi: 10.21437/Interspeech.2011-104

Cite as: Variani, E., Schaaf, T. (2011) VTLN in the MFCC domain: band-limited versus local interpolation. Proc. Interspeech 2011, 1273-1276, doi: 10.21437/Interspeech.2011-104

  author={Ehsan Variani and Thomas Schaaf},
  title={{VTLN in the MFCC domain: band-limited versus local interpolation}},
  booktitle={Proc. Interspeech 2011},