Auditory-Visual Speech Processing 2007 (AVSP2007)

Kasteel Groenendaal, Hilvarenbeek, The Netherlands
August 31 - September 3, 2007

Intelligibility of Natural and 3D-Cloned German Speech

Sascha Fagel (1), Gérard Bailly (2), Frédéric Elisei (2)

(1) Institute for Speech and Communication, Berlin University of Technology, Germany
(2) Department of Speech and Cognition, GIPSA-Lab Grenoble, France

We investigate the intelligibility of natural visual and audiovisual speech compared to re-synthesized speech movements rendered by a talking head. This talking head is created using the speaker cloning methodology of the Institut de la Communication Parlée in Grenoble (now department for speech and cognition in GIPSA-Lab). A German speaker with colored markers on the face was recorded audiovisually using multiple cameras. The three-dimensional coordinates of the markers were extracted and parameterized. Spoken VCV sequences were then visually re-synthesized. A perception experiment was carried out to measure the visual and audiovisual intelligibility of natural and synthesized video, using the original audio with and without added noise. Identification scores show that the clone is capable of recovering almost 70% of the intelligibility gain provided by the original face. Part of this loss is due to missing visual cues in the present synthesis, due notably to the lack of a tongue.

Full Paper

Bibliographic reference.  Fagel, Sascha / Bailly, Gérard / Elisei, Frédéric (2007): "Intelligibility of natural and 3d-cloned German speech", In AVSP-2007, paper L2-1.