Third ESCA/COCOSDA Workshop on Speech Synthesis

November 26-29, 1998
Jenolan Caves House, Blue Mountains, NSW, Australia

Cloning Synthetic Talking Heads

Jialin Zhong, Joseph Olive

Language Modeling Department, Multimedia Communication Research Lab., Bell Labs, Lucent Technologies, Murray Hill, NJ, USA

The quality of Text-to-Visual-Speech synthesis is judged by how well it matches the visual perception of speech articulators with acoustic speech perception. Concurrently, di erent viewers often prefer di erent head models for subjective reasons. Traditional facial animation approach tied the parameterization of animation directly to the model. Switching the head model is dicult because a lengthy training process is required. In this paper, we present a method that creates a new talking head from an existing one without repeating the training process. It is assumed in this work that the visible motion of speech articulators can be described by a small set of feature points. By mapping the 3D trajectories of the feature points from the existing model to the new one, we can transfer the motion of articulators. A morphing algorithm is then used to animate a new talking head from these trajectories of feature points on the new model. The new talking head, though looking di erent, preserves the perceptual quality of the original one.

Full Paper

Bibliographic reference.  Zhong, Jialin / Olive, Joseph (1998): "Cloning Synthetic Talking Heads", In SSW3-1998, 287-292.