7th International Conference on Spoken Language Processing
September 16-20, 2002
Recent experiments suggest that audio-visual interaction in speech perception could begin at a very early level, in which the visual input could improve the detection of speech sounds embedded in noise . We show here that the "speech detection" benefit may result in a "speech identification" benefit different from lipreading per se. The experimental trick consists in using a series of lip gestures compatible with a number of different audio configurations, e.g. [y u ty tu ky ku dy du gy gu] in French. We show that the visual identification of this corpus is random, but, when added to the sound merged in a large amount of cocktail-party noise, vision happens to improve the identification of one phonetic feature, i.e. plosive voicing. We discuss this result in terms of audio-visual scene analysis.
Bibliographic reference. Schwartz, Jean-Luc / Berthommier, Frédéric / Savariaux, Christophe (2002): "Audio-visual scene analysis: evidence for a "very-early" integration process in audio-visual speech perception", In ICSLP-2002, 1937-1940.