We present a novel method for extracting target speech from auditory mixtures using bimodal coherence, which is statistically characterised by a Gaussian mixture modal (GMM) in the off-line training process, using the robust features obtained from the audio-visual speech. We then adjust the ICA-separated spectral components using the bimodal coherence in the time-frequency domain, to mitigate the scale ambiguities in different frequency bins. We tested our algorithm on the XM2VTS database, and the results show the performance improvement with our proposed algorithm in terms of signal to interference ratio measurements.
Bibliographic reference. Liu, Qingju / Wang, Wenwu / Jackson, Philip (2010): "Bimodal coherence based scale ambiguity cancellation for target speech extraction and enhancement", In INTERSPEECH-2010, 438-441.