11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Bimodal Coherence Based Scale Ambiguity Cancellation for Target Speech Extraction and Enhancement

Qingju Liu, Wenwu Wang, Philip Jackson

University of Surrey, UK

We present a novel method for extracting target speech from auditory mixtures using bimodal coherence, which is statistically characterised by a Gaussian mixture modal (GMM) in the off-line training process, using the robust features obtained from the audio-visual speech. We then adjust the ICA-separated spectral components using the bimodal coherence in the time-frequency domain, to mitigate the scale ambiguities in different frequency bins. We tested our algorithm on the XM2VTS database, and the results show the performance improvement with our proposed algorithm in terms of signal to interference ratio measurements.

Full Paper

Bibliographic reference.  Liu, Qingju / Wang, Wenwu / Jackson, Philip (2010): "Bimodal coherence based scale ambiguity cancellation for target speech extraction and enhancement", In INTERSPEECH-2010, 438-441.