14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Effects of Mouth-Only and Whole-Face Displays on Audio-Visual Speech Perception in Noise: Is the Vision of a Talker's Full Face Truly the Most Efficient Solution?

Grozdana Erjavec, Denis Legros

Université Paris 8, France

The goal of the present study was to establish the nature of visual input (featural vs holistic) and the mode of its presentation that facilitates best audio-visual speech perception. Sixteen participants were asked to repeat acoustically strongly and mildly degraded syllables, presented in auditory and three audio-visual conditions, within which one contained holistic and two contained featural visual information. The featural audio-visual conditions differed in characteristics of talker's mouth presentation. Data on correct repetitions and participants fixations duration in talkerfs mouth area were collected. The results showed that the facilitative effect of visual information on speech perception depended upon both auditory input degradation level and the visual presentation format, while eye-movement behavior was only affected by the visual input format. Featural information, when presented in a format containing no high contrast elements, was overall the most efficient visual aid for speech perception. It was also in this format that the fixations duration on talker's mouth was the longest. The results are interpreted with a stress on differences in attentional and perceptual processes that the different visual input formats most likely induced.

Full Paper

Bibliographic reference.  Erjavec, Grozdana / Legros, Denis (2013): "Effects of mouth-only and whole-face displays on audio-visual speech perception in noise: is the vision of a talker's full face truly the most efficient solution?", In INTERSPEECH-2013, 1629-1633.