EUROSPEECH '97
5th European Conference on Speech Communication and Technology

Rhodes, Greece
September 22-25, 1997


Continuous Visual Speech Recognition Using Geometric Lip-Shape Models and Neural Networks

Alexandrina Rogozan, Paul Deleglise

Laboratoire d'Informatique de l'Universite du Maine, Universite du Maine, Le Mans Cedex 9, France

This paper describes a new approach for automatic speechreading. First, we use efficient, but effective representation of visible speech: a geometric lip-shape model. Then we present an automatic objective method to merge phonemes that appear visually similar into visemes for our speaker. In order to determine visemes, we trained SOM using the Kohonen algorithm on each phoneme extracted from our visual database. We go into the presentation of our visual speech recognition systems based on heuristics and neural networks (TDNN or JNN) trained to discriminate visual information. On a continuous spelling task, visual-alone recognition performance of about 37 % was achieved using the TDNN and about 33 % using the JNN one.

Full Paper

Bibliographic reference.  Rogozan, Alexandrina / Deleglise, Paul (1997): "Continuous visual speech recognition using geometric lip-shape models and neural networks", In EUROSPEECH-1997, 1999-2002.