International Workshop on Hands-Free Speech Communication (HSC2001)

April 9-11, 2001
Kyoto, Japan

Speech Recognition Under Noisy Environments Using Spectral Subtraction With Smoothing Of Time Direction And Real-Time Cepstral Mean Normalization

Norihide Kitaoka (1), Ichiro Akahori (1), Seiichi Nakagawa (2)

(1) DENSO Corp., Japan
(2) Toyohashi University of Technology

To reduce the effects of additive noises, spectral subtraction (SS) is often used. SS on the power spectral domain has a problem that the effect of the correlation between speech and noise cannot be removed.

In this paper we propose a spectral subtraction using a smoothing method of time direction to solve this problem. We consider the average of estimated speech power spectra over some frames as the estimated speech power spectrum. We can reduce the effect of correlation between speech and noise with this method. Using shorter frame length makes this method more effective. With these methods, we achieve 14% improvement of recognition rate from the conventional SS in large-vocabulary isolated word recognition test.

We also propose to use the smoothing method in recog- nition with the acoustic models trained using this method in noisy environment. These models improved the recognition rate by over 10% from the original models.

Cepstral mean normalization (CMN) has been used to reduce the convolutional noise caused by the difference of transmission characteristics. The system should wait until the end of utterance to start the recognition when adopting the conventional CMN. We modified the method to estimate compensation parameters from past few utterances for real- time recognition. This method improved the performance of above system under 0dB SNR car noise by approx. 7% recognition rate.


Full Paper

Bibliographic reference.  Kitaoka, Norihide / Akahori, Ichiro / Nakagawa, Seiichi (2001): "Speech recognition under noisy environments using spectral subtraction with smoothing of time direction and real-time cepstral mean normalization", In HSC2001, 159-162.