8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Robust Speech Recognition With Spectral Subtraction In Low SNR

Randy Gomez, Akinobu Lee, Hiroshi Saruwatari, Kiyohiro Shikano

Nara Institute of Science and Technology, Japan

Robust speech recognition in noisy environments is a very difficult task. It is desirable to search for parameters that would relate the speech enhancement technique directly with the recognizer. In this paper, Noise Reduction Rate (NRR) and Mel Cepstrum Distortion (MelCD) are investigated when using Spectral Subtraction (SS). Under low SNR such as 0dB, 5dB, 10dB, maximizing NRR nor minimizing the MelCD does not result in a better recognition performance. Thus, the conventional SS in which the the over-subtraction parameter (alpha) is a function of SNR renders to be ineffective in the point-of-view of the recognizer. Our proposed method derives alpha for SS directly from the training utterances used in creating the Hidden Markov Models (HMM) that optimizes the recognition performance. By superimposing office noise to the SS-denoised noisy speech, we achieved 26.0% and 7.6% of relative increase in word accuracy for the proposed matched and generalized alpha respectively.

Full Paper

Bibliographic reference.  Gomez, Randy / Lee, Akinobu / Saruwatari, Hiroshi / Shikano, Kiyohiro (2004): "Robust speech recognition with spectral subtraction in low SNR", In INTERSPEECH-2004, 2077-2080.