ISCA Archive AVSP 2001
ISCA Archive AVSP 2001

Speech intelligibility derived from asynchronous processing of auditory-visual information

Ken W. Grant, Steven Greenberg

The current study examines the temporal parameters associated with cross-modal integration of auditory-visual information for sentential material. The speech signal was filtered into 1/3-octave channels, all of which were discarded except for a low-frequency (298-375 Hz) and a high-frequency (4762-6000 Hz) band. The intelligibility of this audio-only signal ranged between 9% and 31% for nine normal-hearing subjects. Visual-alone presentation of the same material ranged between 1% and 22% intelligibility. When the audio and video signals are combined and presented in synchrony, intelligibility climbs to an average of 63%. When the audio signal leads the video, intelligibility declines appreciably for even the shortest asynchrony of 40 ms, falling to an asymptotic level of performance for asynchronies of approximately 120 ms and longer. In contrast, when the video signal leads the audio, intelligibility remains relatively stable for onset asynchronies up to 160-200 ms. Hence, there is a marked asymmetry in the integration of audio and visual information that has important implications for sensory-based models of auditory-visual speech processing.


Cite as: Grant, K.W., Greenberg, S. (2001) Speech intelligibility derived from asynchronous processing of auditory-visual information. Proc. Auditory-Visual Speech Processing, 132-137

@inproceedings{grant01_avsp,
  author={Ken W. Grant and Steven Greenberg},
  title={{Speech intelligibility derived from asynchronous processing of auditory-visual information}},
  year=2001,
  booktitle={Proc. Auditory-Visual Speech Processing},
  pages={132--137}
}