9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Neural Network Based Regression for Robust Overlapping Speech Recognition Using Microphone Arrays

Weifeng Li, John Dines, Mathew Magimai Doss, Hervé Bourlard

IDIAP Research Institute, Switzerland

This paper investigates a neural network based acoustic feature mapping to extract robust features for automatic speech recognition (ASR) of overlapping speech. The focus of this work is the novel investigation of additional sources of information to improve the effectiveness of the feature mapping. Specifically, we investigate two additional information sources. Firstly, we investigate the mapping of noisy, higher-order ASR features to clean, lower-order features, demonstrating that the redundancy in the higher order representation can be exploited in the case of overlapping speech. Secondly, we investigate the mapping of features from multiple sound sources, namely from the target and interfering speakers, once again resulting in significant improvements to ASR performance. In the latter case we liken out approach to post-filtering that is undertaken in conventional microphone array beamforming. We demonstrate the effectiveness of the proposed approach through extensive evaluations on the MONC corpus, which includes both non-overlapping single speaker and overlapping multi-speaker conditions.

Full Paper

Bibliographic reference.  Li, Weifeng / Dines, John / Doss, Mathew Magimai / Bourlard, Hervé (2008): "Neural network based regression for robust overlapping speech recognition using microphone arrays", In INTERSPEECH-2008, 2012-2015.