ISCA Archive ICSLP 1994
ISCA Archive ICSLP 1994

Talker localization and speech recognition using a microphone array and a cross-powerspectrum phase analysis

Diego Giuliani, Maurizio Omologo, P. Svaizer

Mismatch in training and testing conditions reduces considerably the performance of a speaker-independent HMM-based continuous speech recognizer. Compensation of this mismatch can avoid the complex and time-consuming retraining of the recognizer. This paper describes an acquisition system based on a four omnidirectional microphone array that was employed to reproduce a "bearnformed" version of the original acoustic messages acquired in a noisy and reverberant environment, with a talker-microphone distance of one meter. In this preliminary activity, some simple noise compensation techniques (i.e. a Mean Spectrum based Enhancement and a Cepstrum Mean Subtraction) were incorporated in this preprocessing stage to obtain an enhanced version of the given utterance. Feeding a clean-condition trained continuous speech recognizer with enhanced signals led to a significant improvement of performance, if compared to the use of unprocessed single-microphone signals as input.


Cite as: Giuliani, D., Omologo, M., Svaizer, P. (1994) Talker localization and speech recognition using a microphone array and a cross-powerspectrum phase analysis. Proc. 3rd International Conference on Spoken Language Processing (ICSLP 1994), 1243-1246

@inproceedings{giuliani94_icslp,
  author={Diego Giuliani and Maurizio Omologo and P. Svaizer},
  title={{Talker localization and speech recognition using a microphone array and a cross-powerspectrum phase analysis}},
  year=1994,
  booktitle={Proc. 3rd International Conference on Spoken Language Processing (ICSLP 1994)},
  pages={1243--1246}
}