ESCA Workshop on Audio-Visual Speech Processing (AVSP'97)

September 26-27, 1997
Rhodes, Greece

Recovering 3D Lip Structure from 2D Observations Using a Model Trained from Video

Sumit Basu, Alex Pentland

Perceptual Computing Section, MIT Media Laboratory, Cambridge, MA, USA

We present a method for recovering 3D lip structure from 2D video observations. We develop a physically-based 3D model of human lips and a framework for training it from real data. The model starts off with unconstrained degrees of freedom and learns a small subspace of permissible motions that explains over 99% of the variance in the observations. This resulting subspace allows estimation of the 3D lip shape from sparse or coarse observations. Results demonstrating the model's ability to reconstruct lip shapes from 2D data (marked points and raw video) are shown. The resulting model can be used for both analysis and synthesis.

