8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

3D Lip-tracking for Audio-Visual Speech Recognition in Real Applications

Petr Cisar, Zdenek Krnoul, Milos Zelezny

Univerzity of West Bohemia in Pilsen, Czech Republic

In this paper, we present a solution to the problem of tracking 3D information about the shape of lips from 2D picture of a speaker. We focus on lip-tracking of audio-visual speech recordings from the corpus recorded in a moving car. In real conditions a head of a speaker (a car driver) can move and turn in various directions. To cope with this movements and to avoid recognition errors caused by changing 3D position of lips, our algorithm utilizes a 3D-modelbased approach to the lip-tracking process. First, we present a method for creating and clustering the lip shape models. Next, we describe an algorithm for finding the shape of the lips in a picture using image processing. Further we present application of a distance function for choosing the best model for representation of the lip shape obtained by image processing. Finally we discuss the results.

Full Paper

Bibliographic reference.  Cisar, Petr / Krnoul, Zdenek / Zelezny, Milos (2004): "3d lip-tracking for audio-visual speech recognition in real applications", In INTERSPEECH-2004, 2521-2524.