Third International Conference on Spoken Language Processing (ICSLP 94)

Yokohama, Japan
September 18-22, 1994

Talker Localization and Speech Recognition Using a Microphone Array and a Cross-Powerspectrum Phase Analysis

Diego Giuliani, Maurizio Omologo, P. Svaizer

IRST-Istituto per la Ricerca Scientifica e Tecnologica, Povo di Trento, Italy

Mismatch in training and testing conditions reduces considerably the performance of a speaker-independent HMM-based continuous speech recognizer. Compensation of this mismatch can avoid the complex and time-consuming retraining of the recognizer. This paper describes an acquisition system based on a four omnidirectional microphone array that was employed to reproduce a "bearnformed" version of the original acoustic messages acquired in a noisy and reverberant environment, with a talker-microphone distance of one meter. In this preliminary activity, some simple noise compensation techniques (i.e. a Mean Spectrum based Enhancement and a Cepstrum Mean Subtraction) were incorporated in this preprocessing stage to obtain an enhanced version of the given utterance. Feeding a clean-condition trained continuous speech recognizer with enhanced signals led to a significant improvement of performance, if compared to the use of unprocessed single-microphone signals as input.

