7th International Conference on Spoken Language Processing
September 16-20, 2002
A new speech enhancement scheme is presented integrating spatial and temporal signal processing methods for robust speech recognition in noisy environments. The scheme first separates spatially localized point sources from noisy speech signals recorded by two microphones. Blind source separation algorithms assuming no a priori knowledge about the sources involved are applied in this spatial processing stage. Then denoising of distributed background noise is achieved in a combined spatial/temporal processing approach. The desired speaker signal is first processed along with an artificially constructed noise signal in a supplementary blind source separation step. It is further denoised by exploiting differences in temporal speech and noise statistics in a wavelet filterbank. The scheme’s performance is illustrated by speech recognition experiments on real recordings in a noisy car environment and compared to conventional techniques like beamforming and spectral subtraction.
Bibliographic reference. Visser, Erik / Otsuka, Manabu / Lee, Te-Won (2002): "A spatio-temporal speech enhancement scheme for robust speech recognition", In ICSLP-2002, 1821-1824.