9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Lip Synchronization: From Phone Lattice to PCA Eigen-Projections Using Neural Networks

Samer Al Moubayed (1), Michael De Smet (2), Hugo Van hamme (2)

(1) KTH, Sweden; (2) Katholieke Universiteit Leuven, Belgium

Lip synchronization is the process of generating natural lip movements from a speech signal. In this work we address the lip-sync problem using an automatic phone recognizer that generates a phone lattice carrying posterior probabilities. The acoustic feature vector contains the posterior probabilities of all the phones over a time window centered at the current time point. Hence this representation characterizes the phone recognition output including the confusion patterns caused by its limited accuracy. A 3D face model with varying texture is computed by analyzing a video recording of the speaker using a 3D morphable model. Training a neural network using 30 000 data vectors from an audiovisual recording in Dutch resulted in a very good simulation of the face on independent data sets of the same or of a different speaker.

Full Paper

Bibliographic reference.  Moubayed, Samer Al / Smet, Michael De / Van hamme, Hugo (2008): "Lip synchronization: from phone lattice to PCA eigen-projections using neural networks", In INTERSPEECH-2008, 2016-2019.