AVSP 2003 - International Conference on Audio-Visual Speech Processing

September 4-7, 2003
St. Jorioz, France

Further Experiments on Audio-Visual Speech Source Separation

David Sodoyer (1), Laurent Girin (1), Christian Jutten (2), Jean-Luc Schwartz (1)

(1) Speech Communication Institute (ICP), CNRS UMR 5009, (2) Image and Signal Processing Laboratory (LIS), CNRS UMR 5083 INPG, Grenoble, France

Looking at the speakerís face seems useful to better hear a speech signal and extract it from competing sources before identification. This might result in elaborating new speech enhancement or extraction techniques exploiting the audio-visual coherence of speech stimuli. In this paper, we present a set of experiments on a novel algorithm plugging audio-visual coherence estimated by statistical tools, on classical blind source separation algorithms. We show in the case of additive mixtures that this algorithm performs better than classical blind tools both when there are as many sensors as sources, and when there are less sensors than sources. Audiovisual coherence enables to focus on the speech source to extract. It may also be used at the output of a classical source separation algorithm, to select the "best" sensor in reference to a target source.

Full Paper

Bibliographic reference.  Sodoyer, David / Girin, Laurent / Jutten, Christian / Schwartz, Jean-Luc (2003): "Further experiments on audio-visual speech source separation ", In AVSP 2003, 145-150.