ISCA Archive ICSLP 2000
ISCA Archive ICSLP 2000

Automatic head gesture learning and synthesis from prosodic cues

Stephen M. Chu, Thomas S. Huang

We present a novel approach to automatically learn and synthesize head gestures using prosodic features extracted from acoustic speech signals. A minimum entropy hidden Markov model is employed to learn the 3-D head-motion of a speaker. The result is a generative model that is compact and highly predictive. The model is further exploited to synchronize the head-motion with a set of continuous prosodic observations and gather the correspondence between the two by sharing its state machine. In synthesis, the prosodic features are used as the cue signal to drive the generative model so that 3-D head gestures can be inferred.

A tracking algorithm based on the B├ęzier volume deformation model is implemented to track the head-motion. To evaluate the performance of the proposed system, we compare the true head-motion with the prosody-inferred motion. The prosody to head-motion mapping acquired through learning is subsequently applied to animate a talking head. Very convincing head-gestures are produced when novel prosodic cues of the same speaker are presented.


Cite as: Chu, S.M., Huang, T.S. (2000) Automatic head gesture learning and synthesis from prosodic cues. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 1, 637-640

@inproceedings{chu00_icslp,
  author={Stephen M. Chu and Thomas S. Huang},
  title={{Automatic head gesture learning and synthesis from prosodic cues}},
  year=2000,
  booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)},
  pages={vol. 1, 637-640}
}