Auditory-Visual Speech Processing 2005

British Columbia, Canada
July 24-27, 2005

Perception of Congruent and Incongruent Audiovisual Speech Stimuli

Jintao Jiang (1), Lynne E. Bernstein (1,2), Edward T. Auer Jr. (3)

(1) Department of Communication Neuroscience, House Ear institute, Los Angeles, CA, USA
(2) National Science Foundation, Social, Behavioral, and Economics Directorate, Arlington, VA, USA
(3) Department of Speech-Language-Hearing: Sciences & Disorders, Univ. of Kansas, Lawrence, KS, USA

Previous studies of audiovisual (AV) speech integration have used behavioral methods to examine perception of congruent and incongruent AV speech stimuli. Such studies have investigated responses to a relatively limited set of the possible incongruent combinations of AV speech stimuli. A central issue for examining a wider range of incongruent AV speech stimuli is developing a systematic method for alignment that will work with a wide variety of segments. In the present study, we investigated the use of three different landmarks (consonant-onset, vowel-onset, and minimum distance) for aligning incongruent AV stimuli. Acoustic /ba/ or /la/ syllables were dubbed onto eight visual Consonant-/a/ syllables that spanned different places and manners of articulation. The AV stimuli were presented to ten participants. Results indicated that the effect of alignment landmark was not significant. The distance measures were found to be related to visual influence. Acoustic /ba/ tokens were more influenced by visual stimuli than acoustic /la/ tokens. Visual influence on the acoustic /ba/ tokens was mainly of the McGurk-type and/or of voicing confusion; while visual influence on the acoustic /la/ tokens was mainly of the combination type (/ba/ + /la/ = /bla/).

Full Paper

Bibliographic reference.  Jiang, Jintao / Bernstein, Lynne E. / Auer Jr., Edward T. (2005): "Perception of congruent and incongruent audiovisual speech stimuli", In AVSP-2005, 39-44.