ISCA Archive ICSLP 2000
ISCA Archive ICSLP 2000

Using the modulation wavelet transform for feature extraction in automatic speech recognition

Kanji Okada, Takayuki Arai, Noburu Kanederu, Yasunori Momomura, Yuji Murahara

In this paper, we examine robust feature extraction methods for automatic speech recognition (ASR) in noise-distorted environments. Several perceptual experiments have shown that the range between 1 and 16 Hz of modulation frequency band is important for human speech recognition. Furthermore it has been reported the same modulation frequency band is important for ASR. Combining the coefficients of multi-resolutional Fourier transform to split the important modulation frequency band for ASR into several bands especially increased recognition performance. Combining coefficients of a multi-resolutional Fourier transform corresponds to a wavelet transform. To test the effectiveness and efficiency of the wavelet transform we, therefore, applied the wavelet transform to recognition experiments. This approach yielded an average of 3% increase in recognition accuracy compared to the standard approach using mel-frequency cepstral coefficients (MFCC) in several noise-distorted environments.


Cite as: Okada, K., Arai, T., Kanederu, N., Momomura, Y., Murahara, Y. (2000) Using the modulation wavelet transform for feature extraction in automatic speech recognition. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 1, 337-340

@inproceedings{okada00_icslp,
  author={Kanji Okada and Takayuki Arai and Noburu Kanederu and Yasunori Momomura and Yuji Murahara},
  title={{Using the modulation wavelet transform for feature extraction in automatic speech recognition}},
  year=2000,
  booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)},
  pages={vol. 1, 337-340}
}