 |
Sixth International Conference on Spoken Language Processing (ICSLP 2000)
Beijing, China
October 16-20, 2000 |
 |
Using the Modulation Wavelet Transform for Feature Extraction in Automatic Speech Recognition
Kanji Okada (1), Takayuki Arai (1), Noburu Kanederu (2), Yasunori Momomura (1), Yuji Murahara (1)
(1) Dept. of Electrical and Electronics Engineering,
Sophia University, Tokyo, Japan
(2) Ishikawa National College of Technology, Japan
In this paper, we examine robust feature extraction methods
for automatic speech recognition (ASR) in noise-distorted
environments. Several perceptual experiments
have shown that the range between 1 and 16 Hz of
modulation frequency band is important for human speech
recognition. Furthermore it has been reported the same
modulation frequency band is important for ASR.
Combining the coefficients of multi-resolutional Fourier
transform to split the important modulation frequency band
for ASR into several bands especially increased
recognition performance. Combining coefficients of a
multi-resolutional Fourier transform corresponds to a wavelet
transform. To test the effectiveness and efficiency of the
wavelet transform we, therefore, applied
the wavelet transform to recognition experiments.
This approach yielded an
average of 3% increase in recognition
accuracy compared to the standard approach using
mel-frequency cepstral coefficients (MFCC) in several
noise-distorted environments.
Full Paper
Bibliographic reference.
Okada, Kanji / Arai, Takayuki / Kanederu, Noburu / Momomura, Yasunori / Murahara, Yuji (2000):
"Using the modulation wavelet transform for feature extraction in automatic speech recognition",
In ICSLP-2000, vol.1, 337-340.