8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Robust Speech Recognition based on HMM Composition and Modified Wiener Filter

Sumitaka Sakauchi, Yoshikazu Yamaguchi, Satoshi Takahashi, Satoshi Kobashikawa

NTT Corporation, Japan

This paper combines the HMM composition method with a highly efficient noise reduction method to create a robust speech recognition technique for additive noise environments. Speech recorded by hands-free microphones in the real world suffer from 1) low Speech/Noise [S/N] and 2) changes in S/N. In particular, S/N varies with the speaker and from utterance to utterance even in a same noise environment. To deal with the low S/N, the proposed technique uses the modified Wiener filter (WF) method for noise reduction and so keeps S/N higher than is possible with spectral subtraction (SS), as well as minimizing speech distortion. To compensate the remaining additive noise, the proposed technique uses the HMM composition method with clean speech models and a noise model trained by the remaining noise. To offset the rapid changes in S/N where S/N may not be known, HMMs composed under various S/N conditions are run in parallel to obtain better recognition results; rapid response is achieved since adaptation to handle speech distortion is not necessary. The new technique shows a reduction in average recognition error of 21.6% under various noise conditions compared to using the basic HMM composition method.

Full Paper

Bibliographic reference.  Sakauchi, Sumitaka / Yamaguchi, Yoshikazu / Takahashi, Satoshi / Kobashikawa, Satoshi (2004): "Robust speech recognition based on HMM composition and modified wiener filter", In INTERSPEECH-2004, 2053-2056.