Sixth European Conference on Speech Communication and Technology

Budapest, Hungary
September 5-9, 1999

Audio-Visual Synthesis of Talking Faces from Speech Production Correlates

Takaaki Kuratate (1), Kevin G. Munhall (2), Philip E. Rubin (3), Eric Vatikiotis-Bateson (1), Hani Yehia (4)

(1) ATR HIP Res. Labs, Seika-cho, Kyoto, Japan
(2) Psychology Dept., Queen's University, Kingston, Ont., Canada
(3) Haskins Labs and Yale University, New Haven, CT, USA
(4) Electronic Engineering Dept., UFMG, Belo Horizonte, Brazil

This paper presents technical refinements and extensions of our system for correlating audible and visible components of speech behavior and subsequently using those correlates to generate realistic talking faces. Introduction of nonlinear estimation techniques has improved our ability to generate facial motion either from the speech acoustics or from orofacial muscle EMG. Also, preliminary evidence is given for the strong correlation found 3D head motion and fundamental frequency (F0). Coupled with improved methods for deriving facial d e-formation parameters from static 3D face scans, more realistic talking faces are now being synth e-sized.

Full Paper (PDF)   Gnu-Zipped Postscript

Bibliographic reference.  Kuratate, Takaaki / Munhall, Kevin G. / Rubin, Philip E. / Vatikiotis-Bateson, Eric / Yehia, Hani (1999): "Audio-visual synthesis of talking faces from speech production correlates", In EUROSPEECH'99, 1279-1282.