8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Video-Realistic Synthetic Speech With a Parametric Visual Speech Synthesizer

Sascha Fagel

Technical University Berlin, Germany

The author presents a new face module for MASSY, the Modular Audiovisual Speech SYnthesizer. Within this face module the system combines two approaches of visual speech synthesis. Although the articulation space is parameterized, the visual synthesis is image based. In contrary, other image based audio-visual speech synthesizers like MIKETALK and VIDEO REWRITE concatenate pre-recorded video images or sequences. The high-level visual speech synthesis generates a sequence of control commands for the visible articulation. The video synthesis searches an image database for appropriate video frames. If missing, the image is generated by deforming a neutral image. MPEG-4 FDPs and additional points in the mouth opening area and around the lower jaw are defined in the neutral image as feature points. A two-dimensional displacement vector is defined for each feature point. The displacement vector of a point in a triangle of feature points is interpolated from the displacement vectors of the vertices.

Full Paper

Bibliographic reference.  Fagel, Sascha (2004): "Video-realistic synthetic speech with a parametric visual speech synthesizer", In INTERSPEECH-2004, 2033-2036.