Sixth International Conference on Spoken Language Processing
October 16-20, 2000
Automatic Head Gesture Learning and Synthesis from Prosodic Cues
Stephen M. Chu, Thomas S. Huang
Beckman Institute and Department of Electrical and Computer Engineering,
University of Illinois at Urbana-Champaign, IL, USA
We present a novel approach to automatically learn and
synthesize head gestures using prosodic features extracted from
acoustic speech signals. A minimum entropy hidden Markov
model is employed to learn the 3-D head-motion of a speaker.
The result is a generative model that is compact and highly
predictive. The model is further exploited to synchronize the
head-motion with a set of continuous prosodic observations and
gather the correspondence between the two by sharing its state
machine. In synthesis, the prosodic features are used as the
cue signal to drive the generative model so that 3-D head
gestures can be inferred.
A tracking algorithm based on the Bézier volume deformation
model is implemented to track the head-motion. To evaluate
the performance of the proposed system, we compare the true
head-motion with the prosody-inferred motion. The prosody
to head-motion mapping acquired through learning is
subsequently applied to animate a talking head. Very
convincing head-gestures are produced when novel prosodic
cues of the same speaker are presented.
Chu, Stephen M. / Huang, Thomas S. (2000):
"Automatic head gesture learning and synthesis from prosodic cues",
In ICSLP-2000, vol.1, 637-640.