ESCA Workshop on Audio-Visual Speech Processing (AVSP'97)

September 26-27, 1997
Rhodes, Greece

Can The Visual Input Make The Audio Signal "Pop Out" In Noise ? A First Study of The Enhancement of Noisy VCV Acoustic Sequences by Audio-Visual Fusion

L. Girin, Jean-Luc Schwartz, G. Feng

Institut de la Communication Partee, UPRESA 5009, INPG/ENSERG/Universite Stendhal, Grenoble, France

This paper deals with a noisy speech enhancement technique based on the fusion of auditory and visual information. We first relate this approach to experimental data suggesting the existence of an "audiovisual scene analysis module". Then we present the implementation in die context of vowel-consonant-vowel transitions corrupted with white noise (four vowels and six plosives). A first evaluation of the system in this context is presented, including informal listening tests, distance measures and gaussian classification scores. The results shows that a good enhancement of the vocalic parts of the signals is obtained while the consonantal parts are not yet improved by the procedure. We present a pist to deal with this problem.

Full Paper

Bibliographic reference.  Girin, L. / Schwartz, Jean-Luc / Feng, G. (1997): "Can the visual input make the audio signal "pop out" in noise ? a first study of the enhancement of noisy VCV acoustic sequences by audio-visual fusion", In AVSP-1997, 37-40.