ISCA Archive Interspeech 2008
ISCA Archive Interspeech 2008

Neural network based regression for robust overlapping speech recognition using microphone arrays

Weifeng Li, John Dines, Mathew Magimai Doss, Hervé Bourlard

This paper investigates a neural network based acoustic feature mapping to extract robust features for automatic speech recognition (ASR) of overlapping speech. The focus of this work is the novel investigation of additional sources of information to improve the effectiveness of the feature mapping. Specifically, we investigate two additional information sources. Firstly, we investigate the mapping of noisy, higher-order ASR features to clean, lower-order features, demonstrating that the redundancy in the higher order representation can be exploited in the case of overlapping speech. Secondly, we investigate the mapping of features from multiple sound sources, namely from the target and interfering speakers, once again resulting in significant improvements to ASR performance. In the latter case we liken out approach to post-filtering that is undertaken in conventional microphone array beamforming. We demonstrate the effectiveness of the proposed approach through extensive evaluations on the MONC corpus, which includes both non-overlapping single speaker and overlapping multi-speaker conditions.


doi: 10.21437/Interspeech.2008-321

Cite as: Li, W., Dines, J., Doss, M.M., Bourlard, H. (2008) Neural network based regression for robust overlapping speech recognition using microphone arrays. Proc. Interspeech 2008, 2012-2015, doi: 10.21437/Interspeech.2008-321

@inproceedings{li08e_interspeech,
  author={Weifeng Li and John Dines and Mathew Magimai Doss and Hervé Bourlard},
  title={{Neural network based regression for robust overlapping speech recognition using microphone arrays}},
  year=2008,
  booktitle={Proc. Interspeech 2008},
  pages={2012--2015},
  doi={10.21437/Interspeech.2008-321}
}