Information from the acoustic speech signal and the talking face is integrated into a unified percept. This is demonstrated in the McGurk effect, in which discrepant visual articulation changes the auditory perception of a consonant. We studied acoustic (A) and visual (V) phonetic features that contribute to audiovisual speech perception by measuring the McGurk effect in two vowel contexts, [a] and [e], at various levels of acoustic noise. The McGurk stimuli consisted of an acoustic [p] presented with a visual [k]. This combination is generally heard as [t] (called fusion) or as [k] (visually dominant percept). The stimulus A[apa]V[aka] was most often heard as [aka], and these percepts increased with noise level. The stimulus A[epe]V[eke] was heard mostly as a fusion [ete], but in high noise also as [eke]. A phonetic analysis showed that, in [e] context, A[p] and V[k] stimulus features were close to those of [t], explaining why fusions were frequent. In [a] context, the visual stimulus had clear features of [k], while the features of the acoustic component were less distinctive, resulting in visual dominance particularly in noise. These results show how audiovisual integration depends on the features of acoustic and visual speech.
Bibliographic reference. Tiippana, Kaisa / Tiainen, Mikko / Vainio, Lari / Vainio, Martti (2013): "Acoustic and visual phonetic features in the mcgurk effect — an audiovisual speech illusion", In INTERSPEECH-2013, 1634-1638.