7th International Conference on Spoken Language Processing

September 16-20, 2002
Denver, Colorado, USA

Audio-Visual Scene Analysis: Evidence for a "Very-Early" Integration Process in Audio-Visual Speech Perception

Jean-Luc Schwartz, Frédéric Berthommier, Christophe Savariaux

Institut de la Communication Parlée/INPG, France

Recent experiments suggest that audio-visual interaction in speech perception could begin at a very early level, in which the visual input could improve the detection of speech sounds embedded in noise [1]. We show here that the "speech detection" benefit may result in a "speech identification" benefit different from lipreading per se. The experimental trick consists in using a series of lip gestures compatible with a number of different audio configurations, e.g. [y u ty tu ky ku dy du gy gu] in French. We show that the visual identification of this corpus is random, but, when added to the sound merged in a large amount of cocktail-party noise, vision happens to improve the identification of one phonetic feature, i.e. plosive voicing. We discuss this result in terms of audio-visual scene analysis.


Full Paper

Bibliographic reference.  Schwartz, Jean-Luc / Berthommier, Frédéric / Savariaux, Christophe (2002): "Audio-visual scene analysis: evidence for a "very-early" integration process in audio-visual speech perception", In ICSLP-2002, 1937-1940.