Fifth ISCA ITRW on Speech Synthesis

June 14-16, 2004
Pittsburgh, PA, USA

Audiovisual Text-to-Cued Speech Synthesis

Guillaume Gibert, Gérard Bailly, Frédéric Elisei

Institut de la Communication Parlée (ICP), UMR CNRS, Grenoble, France

We present here our efforts for implementing a system able to synthesize French Manual Cued Speech (FMCS). We recorded and analyzed the 3D trajectories of 50 hand and 63 facial flesh points during the production of 238 utterances carefully designed for covering all possible di-phones of the French language. Linear and non linear statistical models of the hand and face deformations and postures have been developed using both separate and joint corpora. We create 2 separate dictionaries, one containing diphones and another one containing "dikeys". Using these 2 dictionaries, we implement a complete text-to-cued speech synthesis system by concatenation of di-phones and dikeys.

Full Paper

Bibliographic reference.  Gibert, Guillaume / Bailly, Gérard / Elisei, Frédéric (2004): "Audiovisual text-to-cued speech synthesis", In SSW5-2004, 85-90.