This paper investigates a neural network based acoustic feature mapping to extract robust features for automatic speech recognition (ASR) of overlapping speech. The focus of this work is the novel investigation of additional sources of information to improve the effectiveness of the feature mapping. Specifically, we investigate two additional information sources. Firstly, we investigate the mapping of noisy, higher-order ASR features to clean, lower-order features, demonstrating that the redundancy in the higher order representation can be exploited in the case of overlapping speech. Secondly, we investigate the mapping of features from multiple sound sources, namely from the target and interfering speakers, once again resulting in significant improvements to ASR performance. In the latter case we liken out approach to post-filtering that is undertaken in conventional microphone array beamforming. We demonstrate the effectiveness of the proposed approach through extensive evaluations on the MONC corpus, which includes both non-overlapping single speaker and overlapping multi-speaker conditions.
Bibliographic reference. Li, Weifeng / Dines, John / Doss, Mathew Magimai / Bourlard, Hervé (2008): "Neural network based regression for robust overlapping speech recognition using microphone arrays", In INTERSPEECH-2008, 2012-2015.