AVSP 2003 - International Conference on Audio-Visual Speech Processing
September 4-7, 2003
This paper presents a system that can recover and track the 3D speech movements of a speaker’s face for each image of a monocular sequence. A speaker-specific face model is used for tracking: model parameters are extracted from each image by an analysis-by-synthesis loop. To handle both the individual specificities of the speaker’s articulation and the complexity of the facial deformations during speech, speaker-specific models of the face geometry and appearance are built from real data. The geometric model is linearly controlled by only seven articulatory parameters. Appearance is seen either as a classical texture map or through local appearance of a relevant subset of 3D points. We compare several appearance models: they are either constant or depend linearly on the articulatory parameters. We evaluate these different appearance models with ground truth data.
Presentation. The presentation is packed as a GNU zipped tar archive (28 MB) (which opens both under Windows and UNIX). If you want to open the presentation, download it and decompress it using the "Use folder name" option. This will create a directory av03_105. In this directory, select the file Odisio.html.
Bibliographic reference. Odisio, Matthias / Bailly, Gérard (2003): "Shape and appearance models of talking faces for model-based tracking", In AVSP 2003, 105-110.