We propose a new easy-to-implement method to compute a Linear Transform (LT) to perform Vocal Tract Length Normalization (VTLN) on truncated Mel Frequency Cepstral Coefficients (MFCCs) normally used in distributed speech recognition. The method is based on a Local Interpolation which is independent of the Mel filter design. Local Interpolation (LILT) VTLN is theoretically and experimentally compared to a global scheme based on band-limited interpolation (BLI-VTLN) and the conventional frequency warping scheme (FFT-VTLN). Investigating the interoperability of these methods shows that the performance of LILT-VTLN is on par with FFT-VTLN and BLI-VTLN. The statistical significance test also shows that there are no significant differences between FFT-VTLN, LILT-VTLN, and BLI-VTLN, even if the models and front ends do not match.
Cite as: Variani, E., Schaaf, T. (2011) VTLN in the MFCC domain: band-limited versus local interpolation. Proc. Interspeech 2011, 1273-1276, doi: 10.21437/Interspeech.2011-104
@inproceedings{variani11_interspeech, author={Ehsan Variani and Thomas Schaaf}, title={{VTLN in the MFCC domain: band-limited versus local interpolation}}, year=2011, booktitle={Proc. Interspeech 2011}, pages={1273--1276}, doi={10.21437/Interspeech.2011-104} }