This paper presents a system that can recover and track the 3D speech movements of a speakers face for each image of a monocular sequence. A speaker-specific face model is used for tracking: model parameters are extracted from each image by an analysis-by-synthesis loop. To handle both the individual specificities of the speakers articulation and the complexity of the facial deformations during speech, speaker-specific models of the face geometry and appearance are built from real data. The geometric model is linearly controlled by only seven articulatory parameters. Appearance is seen either as a classical texture map or through local appearance of a relevant subset of 3D points. We compare several appearance models: they are either constant or depend linearly on the articulatory parameters. We evaluate these different appearance models with ground truth data.
Cite as: Odisio, M., Bailly, G. (2003) Shape and appearance models of talking faces for model-based tracking. Proc. Auditory-Visual Speech Processing, 105-110
@inproceedings{odisio03_avsp, author={Matthias Odisio and Gérard Bailly}, title={{Shape and appearance models of talking faces for model-based tracking}}, year=2003, booktitle={Proc. Auditory-Visual Speech Processing}, pages={105--110} }