Sixth International Conference on Spoken Language Processing (ICSLP 2000)

Beijing, China
October 16-20, 2000

Using the Modulation Wavelet Transform for Feature Extraction in Automatic Speech Recognition

Kanji Okada (1), Takayuki Arai (1), Noburu Kanederu (2), Yasunori Momomura (1), Yuji Murahara (1)

(1) Dept. of Electrical and Electronics Engineering, Sophia University, Tokyo, Japan
(2) Ishikawa National College of Technology, Japan

In this paper, we examine robust feature extraction methods for automatic speech recognition (ASR) in noise-distorted environments. Several perceptual experiments have shown that the range between 1 and 16 Hz of modulation frequency band is important for human speech recognition. Furthermore it has been reported the same modulation frequency band is important for ASR. Combining the coefficients of multi-resolutional Fourier transform to split the important modulation frequency band for ASR into several bands especially increased recognition performance. Combining coefficients of a multi-resolutional Fourier transform corresponds to a wavelet transform. To test the effectiveness and efficiency of the wavelet transform we, therefore, applied the wavelet transform to recognition experiments. This approach yielded an average of 3% increase in recognition accuracy compared to the standard approach using mel-frequency cepstral coefficients (MFCC) in several noise-distorted environments.

Full Paper

Bibliographic reference.  Okada, Kanji / Arai, Takayuki / Kanederu, Noburu / Momomura, Yasunori / Murahara, Yuji (2000): "Using the modulation wavelet transform for feature extraction in automatic speech recognition", In ICSLP-2000, vol.1, 337-340.