Auditory-Visual Speech Processing 2005

British Columbia, Canada
July 24-27, 2005

Visual Contribution to Speech Perception: Measuring the Intelligibility of Talking Heads

Slim Ouni, Michael M. Cohen, Hope Ishak, Dominic W. Massaro

Perceptual Science Laboratory - University of California at Santa Cruz, CA, USA

Animated agents are becoming increasingly frequent in research and applications in speech science. An important challenge is to evaluate the effectiveness of the agent in terms of the intelligibility of its visible speech. Sumby and Pollack (1954) proposed a metric to describe the benefit provided by the face relative to the auditory speech presented alone. We extend this metric to describe the benefit provided by a synthetic animated face relative to the benefit provided by a natural face. The validity of the metric is tested in a new experiment in which auditory speech is presented under 5 different noise levels and is paired with either our synthetic talker Baldi or a natural talker (the standard). A valid metric would allow direct comparisons across different experiments and would give measures of the benefit of a synthetic animated face relative to a natural face and how this benefit varies as a function of the type of synthetic face, the test items (e.g., syllables versus sentences, viseme class), different individuals, and applications.

Full Paper

Bibliographic reference.  Ouni, Slim / Cohen, Michael M. / Ishak, Hope / Massaro, Dominic W. (2005): "Visual contribution to speech perception: measuring the intelligibility of talking heads", In AVSP-2005, 45-46.