INTERSPEECH 2012
13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

An Auditory Inspired Multimodal Framework for Speech Enhancement

Majid Mirbagheri (1), Sahar Akram (1), Shihab Shamma (1,2)

(1) Institute for System Research; (2) Department of Electrical and Computer Engineering;
University of Maryland College Park, MD, USA

A new multimodal framework for speech enhancement in noisy environments based on human auditory system model is proposed in this paper. Unlike existing engineering architectures each of which specifically designed for certain speech sensors (extracted pitch, visual cues, etc.), our proposed model provides the capacity to integrate cues of different type into the enhancement system by introducing the notion of temporal coherence. The short-time coherence coefficients (STCC) between sound components and cues computed through an estimate of mutual information are used as a measure of target speech dominance and consequently the gain coefficients. The objective evaluation results for two exemplars in this framework show that the new methodology is effective in practice.

Index Terms: speech enhancement, multimodal, mutual information, auditory

Full Paper

Bibliographic reference.  Mirbagheri, Majid / Akram, Sahar / Shamma, Shihab (2012): "An auditory inspired multimodal framework for speech enhancement", In INTERSPEECH-2012, 158-161.