Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Studies of Audiovisual Speech Perception Using Production-Based Animation

K. G. Munhall (1,2), C. Kroos (3), T. Kuratate (3), J. Lucero (4), M. Pitermann (1), Eric Vatikiotis-Bateson (3), H. Yehia (5)

(1) Dept. of Pscholog, (2) Dept. of Otolaryngology, Queen's University, Edinburgh, UK
(3) ATR International Information Sciences Division, Kyoto, Japan
(4) Dept. Mathematics, University of Brasilia, Brasil
(5) Department of Engineering, University Federal de Minas Gerais, Brasil

This paper will summarize our work at Queen's University and ATR Laboratories on cross-modal speech perception and production. Our approach has been to study these two sides of speech together and to use the multi-modal speech production data to parameterize and control audiovisual animation systems. Two approaches to production-based facial animation have been pursued - one statistical and the other physical. In both cases, realistic talking head animations are generated from continuous input of production data. The statistical animation method of AV synthesis extends our multi-linear techniques developed for the analysis of orofacial motion and speech acoustics to include the correlation between measured 3D positions on the face and deformation coefficients of the facial surface. In the physical approach, the dynamic form of the animation is determined by the biophysical characteristics of the animated object. The physical model consists of multiple structural layers: model skull and jaw surfaces, an orofacial muscle layer, and a three-layer polygon model of the soft tissue. In a series of studies using these animation approaches we have examined the conditions under which speech perception in noise is enhanced by simultaneous visual presentation. Our data show a distinction between visual prosody and segmental perception as well as demonstrating that our animated stimuli produce natural increases in speech intelligibility.

Full Paper

Bibliographic reference.  Munhall, K. G. / Kroos, C. / Kuratate, T. / Lucero, J. / Pitermann, M. / Vatikiotis-Bateson, Eric / Yehia, H. (2000): "Studies of audiovisual speech perception using production-based animation", In ICSLP-2000, vol.3, 7-10.