Machine Listening in Multisource Environments (CHiME) 2011

Florence, Italy
September 1, 2011

Recent Advances in Fragment-Based Speech Recognition in Reverberant Multisource Environments

Ning Ma, Jon Barker, Heidi Christensen, Phil Green

Department of Computer Science, University of Sheffield, Sheffield, UK

This paper addresses the problem of speech recognition using distant binaural microphones in reverberant multisource noise conditions. Our scheme employs a two stage fragment decoding approach: first spectro-temporal acoustic source fragments are identified using signal level cues, and second, a hypothesisdriven stage simultaneously searches for the most probable speech/background fragment labelling and the corresponding acoustic model state sequence. The paper reports recent advances in combining adaptive noise floor modelling and binaural localisation cues within this framework. The decoder is able to derive significant recognition performance benefits from both noise floor tracking and fragment location estimates. Using models trained on noise-free speech, the system achieves an average keyword recognition accuracy of 80.60% for the final test set on the PASCAL CHiME Challenge task.

Full Paper

Bibliographic reference.  Ma, Ning / Barker, Jon / Christensen, Heidi / Green, Phil (2011): "Recent advances in fragment-based speech recognition in reverberant multisource environments", In CHiME-2011, 68-73.