September 22-25, 1997
This paper presents a method for the extraction of articulatory parameters from direct processing of raw images of the lips. The system architecture is made of three independent parts. First, a new greyscale mouth image is centred and downsampled. Second, the image is aligned and projected onto a basis of artificial images. These images are the eigenvectors computed from a PCA applied on a set of 23 reference lip shapes. Then, a multilinear interpolation predicts articulatory parameters from the image projection coefficients onto the eigenvectors. In addition, the projection coefficients and the predicted parameters were evaluated by an HMM-based visual speech recogniser. Recognition scores obtained with our method are compared to reference scores and discussed.
Bibliographic reference. Reveret, Lionel (1997): "From raw images of the lips to articulatory parameters: a viseme-based prediction", In EUROSPEECH-1997, 2011-2014.