ISCA Archive ICSLP 2000
ISCA Archive ICSLP 2000

Integrating the energy information into MFCC

Fang Zheng, Guoliang Zhang

The Mel-Frequency Cepstrum Coefficients (MFCC) is a widely used set of feature used in automatic speech recognition systems introduced in 1980 by Davis and Mermelstein [1]. In this traditional implementation, the 0th coefficient is excluded for the reason it is somewhat unreliable. In this paper, we analyze this term and find that it can be regarded as the generalized frequency band energy (FBE) and is hence useful, resulting in the FBE-MFCC. We also propose a better analysis, called the auto-regressive analysis, on the frame energy, which performs better than its 1st and/or 2nd order differential derivatives. Experiments show that, the FBE-MFCC and the frame energy with their corresponding auto-regressive analysis coefficients form the better combination reducing the syllable error rate (SER) by 10.0% across a giant speech database, compared to the traditional MFCC with its corresponding autoregressive analysis coefficients.

Davis, S.B. and Mermelstein, P. (1980), “Comparison of parametric representation for monosyllabic word recognition in continuously spoken sentences,” IEEE Trans. on ASSP, Aug. 1980.

doi: 10.21437/ICSLP.2000-96

Cite as: Zheng, F., Zhang, G. (2000) Integrating the energy information into MFCC. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 1, 389-392, doi: 10.21437/ICSLP.2000-96

  author={Fang Zheng and Guoliang Zhang},
  title={{Integrating the energy information into MFCC}},
  booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)},
  pages={vol. 1, 389-392},