Machine Listening in Multisource Environments (CHiME) 2011
This paper addresses the problem of speech recognition using distant binaural microphones in reverberant multisource noise conditions. Our scheme employs a two stage fragment decoding approach: first spectro-temporal acoustic source fragments are identified using signal level cues, and second, a hypothesisdriven stage simultaneously searches for the most probable speech/background fragment labelling and the corresponding acoustic model state sequence. The paper reports recent advances in combining adaptive noise floor modelling and binaural localisation cues within this framework. The decoder is able to derive significant recognition performance benefits from both noise floor tracking and fragment location estimates. Using models trained on noise-free speech, the system achieves an average keyword recognition accuracy of 80.60% for the final test set on the PASCAL CHiME Challenge task.
Bibliographic reference. Ma, Ning / Barker, Jon / Christensen, Heidi / Green, Phil (2011): "Recent advances in fragment-based speech recognition in reverberant multisource environments", In CHiME-2011, 68-73.