Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Integrating the Energy Information into MFCC

Fang Zheng, Guoliang Zhang

Center of Speech Technology, State Key Laboratory of Intelligent Technology and Systems, Department of Computer Science and Technology, Tsinghua University, Beijing, China

The Mel-Frequency Cepstrum Coefficients (MFCC) is a widely used set of feature used in automatic speech recognition systems introduced in 1980 by Davis and Mermelstein [1]. In this traditional implementation, the 0th coefficient is excluded for the reason it is somewhat unreliable. In this paper, we analyze this term and find that it can be regarded as the generalized frequency band energy (FBE) and is hence useful, resulting in the FBE-MFCC. We also propose a better analysis, called the auto-regressive analysis, on the frame energy, which performs better than its 1st and/or 2nd order differential derivatives. Experiments show that, the FBE-MFCC and the frame energy with their corresponding auto-regressive analysis coefficients form the better combination reducing the syllable error rate (SER) by 10.0% across a giant speech database, compared to the traditional MFCC with its corresponding autoregressive analysis coefficients.

Reference

  1. Davis, S.B. and Mermelstein, P. (1980), “Comparison of parametric representation for monosyllabic word recognition in continuously spoken sentences,” IEEE Trans. on ASSP, Aug. 1980.


Full Paper

Bibliographic reference.  Zheng, Fang / Zhang, Guoliang (2000): "Integrating the energy information into MFCC", In ICSLP-2000, vol.1, 389-392.