12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Binaural Cues for Fragment-Based Speech Recognition in Reverberant Multisource Environments

Ning Ma, Jon Barker, Heidi Christensen, Phil D. Green

University of Sheffield, UK

This paper addresses the problem of speech recognition using distant binaural microphones in reverberant multisource noise conditions. Our scheme employs a two stage fragment decoding approach: first spectro-temporal acoustic source fragments are identified using signal level cues, and second, a hypothesisdriven stage simultaneously searches for the most probable speech/background fragment labelling and the corresponding acoustic model state sequence. The paper reports the first successful attempt to use binaural localisation cues within this framework. By integrating binaural cues and acoustic models in a consistent probabilistic framework, the decoder is able to derive significant recognition performance benefits from fragment location estimates despite their inherent unreliability.

Full Paper

Bibliographic reference.  Ma, Ning / Barker, Jon / Christensen, Heidi / Green, Phil D. (2011): "Binaural cues for fragment-based speech recognition in reverberant multisource environments", In INTERSPEECH-2011, 1657-1660.