8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


Extracting an AV Speech Source from a Mixture of Signals

David Sodoyer (1), Laurent Girin (1), Christian Jutten (2), Jean-Luc Schwartz (1)

(1) ICP-CNRS, France
(2) LIS-CNRS, France

We present a new approach to the source separation problem for multiple speech signals. Using the extra visual information of the face speaker, the method aims to extract an acoustic speech signal from other acoustic signals by exploiting its coherence with the speaker's lip movements. We define a statistical model of the joint probability of visual and spectral audio input for quantifying the audio-visual coherence. Then, separation can be achieved by maximising this joint probability. Experiments on additive mixtures of 2, 3 and 5 sources show that the algorithm performs well, and systematically better than the classical BSS algorithm JADE.

Full Paper

Bibliographic reference.  Sodoyer, David / Girin, Laurent / Jutten, Christian / Schwartz, Jean-Luc (2003): "Extracting an AV speech source from a mixture of signals", In EUROSPEECH-2003, 1393-1396.