Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Filterbank-Based Feature Extraction for Speech Recognition and Its Application to Voice Mail Transcription

Jun Huang (1), Mukund Padmanabhan (2)

(1) University of Illinois, Urbana, IL, USA
(2) IBM T.J. Watson Research Center, Yorktown Heights, NY, USA

In this paper, we propose a filterbank-based technique to extract more robust and discriminative features for the application of telephony speech recognition. First, we propose an extended Lerner grouping method to approximate the shape of the Mel filters in MFCC while reducing the cross-correlation between filterbank outputs. Then we used welch processing to reduce the variance of the spectral features while retaining the spectral resolution. Finally, we describe experiments where we augment the cepstral features with formant related features, computed using an adaptive filterbank. The new features represent the trajectory of the frequency components within different formant bands. Experimental results showed that the welch processing consistently improved the word error rate on a task of large vocabulary voice mail transcription and the formant related features provide higher discriminability than the MFCC features.

Full Paper

Bibliographic reference.  Huang, Jun / Padmanabhan, Mukund (2000): "Filterbank-based feature extraction for speech recognition and its application to voice mail transcription", In ICSLP-2000, vol.4, 668-671.