ISCA Archive ICSLP 2000
ISCA Archive ICSLP 2000

Filterbank-based feature extraction for speech recognition and its application to voice mail transcription

Jun Huang, Mukund Padmanabhan

In this paper, we propose a filterbank-based technique to extract more robust and discriminative features for the application of telephony speech recognition. First, we propose an extended Lerner grouping method to approximate the shape of the Mel filters in MFCC while reducing the cross-correlation between filterbank outputs. Then we used welch processing to reduce the variance of the spectral features while retaining the spectral resolution. Finally, we describe experiments where we augment the cepstral features with formant related features, computed using an adaptive filterbank. The new features represent the trajectory of the frequency components within different formant bands. Experimental results showed that the welch processing consistently improved the word error rate on a task of large vocabulary voice mail transcription and the formant related features provide higher discriminability than the MFCC features.


Cite as: Huang, J., Padmanabhan, M. (2000) Filterbank-based feature extraction for speech recognition and its application to voice mail transcription. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 4, 668-671

@inproceedings{huang00g_icslp,
  author={Jun Huang and Mukund Padmanabhan},
  title={{Filterbank-based feature extraction for speech recognition and its application to voice mail transcription}},
  year=2000,
  booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)},
  pages={vol. 4, 668-671}
}