INTERSPEECH 2004 - ICSLP
In this paper, we present a solution to the problem of tracking 3D information about the shape of lips from 2D picture of a speaker. We focus on lip-tracking of audio-visual speech recordings from the corpus recorded in a moving car. In real conditions a head of a speaker (a car driver) can move and turn in various directions. To cope with this movements and to avoid recognition errors caused by changing 3D position of lips, our algorithm utilizes a 3D-modelbased approach to the lip-tracking process. First, we present a method for creating and clustering the lip shape models. Next, we describe an algorithm for finding the shape of the lips in a picture using image processing. Further we present application of a distance function for choosing the best model for representation of the lip shape obtained by image processing. Finally we discuss the results.
Bibliographic reference. Cisar, Petr / Krnoul, Zdenek / Zelezny, Milos (2004): "3d lip-tracking for audio-visual speech recognition in real applications", In INTERSPEECH-2004, 2521-2524.