ISCA Archive AVSP 2013
ISCA Archive AVSP 2013

Data and simulations about audiovisual asynchrony and predictability in speech perception

Jean-Luc Schwartz, Christophe Savariaux

Since a paper by Chandrasekaran et al. (2009), an increasing number of neuroscience papers capitalize on the assumption that visual speech would be typically 150 ms ahead of auditory speech. It happens that the estimation of audiovisual asynchrony by Chandrasekaran et al. is valid only in very specific cases, for isolated CV syllables or at the beginning of a speech utterance. We present simple audiovisual data on plosive-vowel syllables (pa, ta, ka, ba, da, ga, ma, na) showing that audiovisual synchrony is actually rather precise when syllables are chained in sequences, as they are typically in most parts of a natural speech utterance. Then we discuss on the way the natural coordination between sound and image (combining cases of lead and lag of the visual input) is reflected in the so-called temporal integration window for audiovisual speech perception (van Wassenhove et al., 2007). We conclude by a computational proposal about predictive coding in such sequences, showing that the visual input may actually provide and enhance predictions even if it is quite synchronous with the auditory input.

Index Terms: audiovisual asynchrony, temporal integration window, predictive coding, visual lead/lag, visual prediction


Cite as: Schwartz, J.-L., Savariaux, C. (2013) Data and simulations about audiovisual asynchrony and predictability in speech perception. Proc. Auditory-Visual Speech Processing, 147-152

@inproceedings{schwartz13_avsp,
  author={Jean-Luc Schwartz and Christophe Savariaux},
  title={{Data and simulations about audiovisual asynchrony and predictability in speech perception}},
  year=2013,
  booktitle={Proc. Auditory-Visual Speech Processing},
  pages={147--152}
}