International Conference on Auditory-Visual Speech Processing 2008

Tangalooma Wild Dolphin Resort, Moreton Island, Queensland, Australia
September 26-29, 2008

A Model for the Dynamics of Articulatory Lip Movements

Žórir Haršarson (1), Hans-Heinrich Bothe (1,2)

(1) Centre for Applied Hearing Research, Technical University of Denmark in Lyngby, Denmark
(2) Institute of Biomedical Engineering, Technical University of Berlin, Germany

The present work is part of a framework to design and implement a language laboratory for speech reading/lip reading for multiple languages. It is based on the interdisciplinary project LIPPS at Technical University of Berlin, Germany, which aims to develop a training-aid for speech reading by employing a text-driven facial animation from a single passport photo with the help of 2D image morphing. The LIPPS system may be particularly helpful for patients with a sudden profound hearing-loss, enabling them to start learning speech reading already in the hospital after operation or during subsequent rehabilitation.

The present project uses dynamic models for the changes of important visual features. We apply the ideas of i) specific ‘characteristic’ images being related to the sounds or phonemes of an utterance and ii) visemes being related to the phonemes and represented by the dynamics of linear secondorder models.

We aim to extend the idea that visemes are related to single characteristic images or poses of the face towards temporally varying units, as it is the case for the correlating auditory units, the phonemes.

We analyzed video clips with moving faces and modeled the prediction of certain visual features at locations of the characteristic images (the characteristic instances) as well as of transitional changes of the feature sets between neighboring characteristic instances. Contextual modulations of the visual features are described with the help of a dominance model. High dominance is given to visemes with indispensable features as, for instance, complete or partial lip closure (e.g., bilabial or fricative visemes), whereas low dominance is given to practically invisible visemes (e.g., guttural visemes), when the lips mainly prepare the transition towards later dominant phonemes.

The described method may also be applied to other types of facial animation systems as to the control parameters of anatomical face models.

Full Paper

Bibliographic reference.  Haršarson, Žórir / Bothe, Hans-Heinrich (2008): "A model for the dynamics of articulatory lip movements", In AVSP-2008, 209-214.