Visible speech movements were motion captured and parameterized. Coarticulated targets were extracted from VCVs and modeled to generate arbitrary German utterances by target interpolation. The system was extended to synthesize English utterances by a mapping to German phonemes. An evaluation by means of a modified rhyme test reveals that the synthetic videos of isolated words increase the recognition scores from 27% to 47.5% when added to audio only presentation.
Bibliographic reference. Fagel, Sascha / Elisei, Frédéric / Bailly, Gérard (2008): "From 3-d speaker cloning to text-to-audiovisual-speech", In INTERSPEECH-2008, 2325.