7th International Conference on Spoken Language Processing

September 16-20, 2002
Denver, Colorado, USA

A Spatio-Temporal Speech Enhancement Scheme for Robust Speech Recognition

Erik Visser (1), Manabu Otsuka (2), Te-Won Lee (1)

(1) University of California at San Diego, USA; (2) Denso Corporation, Japan

A new speech enhancement scheme is presented integrating spatial and temporal signal processing methods for robust speech recognition in noisy environments. The scheme first separates spatially localized point sources from noisy speech signals recorded by two microphones. Blind source separation algorithms assuming no a priori knowledge about the sources involved are applied in this spatial processing stage. Then denoising of distributed background noise is achieved in a combined spatial/temporal processing approach. The desired speaker signal is first processed along with an artificially constructed noise signal in a supplementary blind source separation step. It is further denoised by exploiting differences in temporal speech and noise statistics in a wavelet filterbank. The scheme’s performance is illustrated by speech recognition experiments on real recordings in a noisy car environment and compared to conventional techniques like beamforming and spectral subtraction.


Full Paper

Bibliographic reference.  Visser, Erik / Otsuka, Manabu / Lee, Te-Won (2002): "A spatio-temporal speech enhancement scheme for robust speech recognition", In ICSLP-2002, 1821-1824.