11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Combining Monaural and Binaural Evidence for Reverberant Speech Segregation

John Woodruff, Rohit Prabhavalkar, Eric Fosler-Lussier, DeLiang Wang

Department of Computer Science and Engineering, Ohio State University, USA

Most existing binaural approaches to speech segregation rely on spatial filtering. In environments with minimal reverberation and when sources are well separated in space, spatial filtering can achieve excellent results. However, in everyday environments performance degrades substantially. To address these limitations, we incorporate monaural analysis within a binaural segregation system. We use monaural cues to perform both local and across frequency grouping of mixture components, allowing for a more robust application of spatial filtering. We propose a novel framework in which we combine monaural grouping evidence and binaural localization evidence in a linear model for the estimation of the ideal binary mask. Results indicate that with appropriately designed features that capture both monaural and binaural evidence, an extremely simple model achieves a signal-to-noise ratio improvement of up to 4 dB relative to using spatial filtering alone.

Full Paper

Bibliographic reference.  Woodruff, John / Prabhavalkar, Rohit / Fosler-Lussier, Eric / Wang, DeLiang (2010): "Combining monaural and binaural evidence for reverberant speech segregation", In INTERSPEECH-2010, 406-409.