ESCA Workshop on Audio-Visual Speech Processing (AVSP'97)
September 26-27, 1997
Lip synchronization is the determination of the motion of the mouth and tongue during speech. It can be deduced from the speech signal without phonemic analysis, and irrespective of the content of the speech. Our method is based on the observation that the position of the mouth over a short interval of time can be correlated with the basic shape of the spectrum of the speech over that same interval. The spectrum is obtained from a Fast Fourier Transform (FFT) and treated like a discrete probability density function. Statistical measures called moments are used to describe the shape.
For several canonical utterances, video measurements of a speaker's mouth are combined with the corresponding moments to produce continuous predictor surfaces for each of three mouth parameters: jaw position, horizontal opening between the lips and vertical opening between the lips. The method involves smoothing so it is independent of the local behavior of the spectrum.
Bibliographic reference. McAllister, David F. / Rodman, Robert D. / Bitzer, Donald L. / Freeman, Andrew S. (1997): "Lip synchronization of speech", In AVSP-1997, 133-136.