INTERSPEECH 2006 - ICSLP
This study examines audio-visual perception of second-language (L2) speech, with the goal of investigating the extent to which the auditory and visual input modalities are integrated in processing unfamiliar L2 speech. Native (Canadian English) and nonnative (Mandarin) perceivers responses were collected for a set of fricative-initial syllables presented with a quiet and a cafe-noise background, and presented in four ways: congruent audio-visual (AVc), incongruent audio-visual (AVi), audioonly (A) and visual-only (V). Results show that for both native groups, performance was better in the AVc condition than A or V condition; and better in quiet than in cafe-noise background. A comparison of the native and nonnative performance revealed that Mandarin participants showed (1) poorer identification of the L2 interdental fricatives, (2) a greater degree of reliance on visual information, even when auditory information was available, and (3) a higher percentage of McGurk responses with the incongruent AV speech. These findings indicate that although nonnatives were able to use visual information, they failed to adopt the visual cues that are linguistically characteristic of the L2 sounds, suggesting a language-specific AV processing pattern. However, similarities between the two native groups are also indicative of possible perceptual universals involved. Together they point to an integrated network in speech processing across modalities.
Bibliographic reference. Wang, Yue / Behne, Dawn / Jiang, Haisheng / Danyluck, Chad (2006): "Native and nonnative audio-visual perception of English fricatives in quiet and cafe-noise backgrounds", In INTERSPEECH-2006, paper 1798-Tue1BuP.8.