Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Automatic Head Gesture Learning and Synthesis from Prosodic Cues

Stephen M. Chu, Thomas S. Huang

Beckman Institute and Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, IL, USA

We present a novel approach to automatically learn and synthesize head gestures using prosodic features extracted from acoustic speech signals. A minimum entropy hidden Markov model is employed to learn the 3-D head-motion of a speaker. The result is a generative model that is compact and highly predictive. The model is further exploited to synchronize the head-motion with a set of continuous prosodic observations and gather the correspondence between the two by sharing its state machine. In synthesis, the prosodic features are used as the cue signal to drive the generative model so that 3-D head gestures can be inferred.

A tracking algorithm based on the Bézier volume deformation model is implemented to track the head-motion. To evaluate the performance of the proposed system, we compare the true head-motion with the prosody-inferred motion. The prosody to head-motion mapping acquired through learning is subsequently applied to animate a talking head. Very convincing head-gestures are produced when novel prosodic cues of the same speaker are presented.


Full Paper

Bibliographic reference.  Chu, Stephen M. / Huang, Thomas S. (2000): "Automatic head gesture learning and synthesis from prosodic cues", In ICSLP-2000, vol.1, 637-640.