September 22-25, 1997
This paper describes a new approach for automatic speechreading. First, we use efficient, but effective representation of visible speech: a geometric lip-shape model. Then we present an automatic objective method to merge phonemes that appear visually similar into visemes for our speaker. In order to determine visemes, we trained SOM using the Kohonen algorithm on each phoneme extracted from our visual database. We go into the presentation of our visual speech recognition systems based on heuristics and neural networks (TDNN or JNN) trained to discriminate visual information. On a continuous spelling task, visual-alone recognition performance of about 37 % was achieved using the TDNN and about 33 % using the JNN one.
Bibliographic reference. Rogozan, Alexandrina / Deleglise, Paul (1997): "Continuous visual speech recognition using geometric lip-shape models and neural networks", In EUROSPEECH-1997, 1999-2002.