5th European Conference on Speech Communication and Technology

Rhodes, Greece
September 22-25, 1997

From Raw Images of the Lips to Articulatory Parameters: A Viseme-Based Prediction

Lionel Reveret

Institut de la Communication Parlee Universite Stendhal / INPG, Cedex 9 Grenoble, France

This paper presents a method for the extraction of articulatory parameters from direct processing of raw images of the lips. The system architecture is made of three independent parts. First, a new greyscale mouth image is centred and downsampled. Second, the image is aligned and projected onto a basis of artificial images. These images are the eigenvectors computed from a PCA applied on a set of 23 reference lip shapes. Then, a multilinear interpolation predicts articulatory parameters from the image projection coefficients onto the eigenvectors. In addition, the projection coefficients and the predicted parameters were evaluated by an HMM-based visual speech recogniser. Recognition scores obtained with our method are compared to reference scores and discussed.

Bibliographic reference.  Reveret, Lionel (1997): "From raw images of the lips to articulatory parameters: a viseme-based prediction", In EUROSPEECH-1997, 2011-2014.