Auditory-Visual Speech Processing (AVSP) 2013

Annecy, France
August 29 - September 1, 2013

Data and Simulations about Audiovisual Asynchrony and Predictability in Speech Perception

Jean-Luc Schwartz, Christophe Savariaux

GIPSA-Lab, Speech and Cognition Department, UMR 5216 CNRS Grenoble-Alps University, France

Since a paper by Chandrasekaran et al. (2009), an increasing number of neuroscience papers capitalize on the assumption that visual speech would be typically 150 ms ahead of auditory speech. It happens that the estimation of audiovisual asynchrony by Chandrasekaran et al. is valid only in very specific cases, for isolated CV syllables or at the beginning of a speech utterance. We present simple audiovisual data on plosive-vowel syllables (pa, ta, ka, ba, da, ga, ma, na) showing that audiovisual synchrony is actually rather precise when syllables are chained in sequences, as they are typically in most parts of a natural speech utterance. Then we discuss on the way the natural coordination between sound and image (combining cases of lead and lag of the visual input) is reflected in the so-called temporal integration window for audiovisual speech perception (van Wassenhove et al., 2007). We conclude by a computational proposal about predictive coding in such sequences, showing that the visual input may actually provide and enhance predictions even if it is quite synchronous with the auditory input.

Index Terms: audiovisual asynchrony, temporal integration window, predictive coding, visual lead/lag, visual prediction

Full Paper

Bibliographic reference.  Schwartz, Jean-Luc / Savariaux, Christophe (2013): "Data and simulations about audiovisual asynchrony and predictability in speech perception", In AVSP-2013, 147-152.