Auditory-Visual Speech Processing (AVSP) 2009

University of East Anglia, Norwich, UK
September 10-13, 2009

Comparing Visual Features for Lipreading

Yuxuan Lan (1), Richard Harvey (1), Barry-John Theobald (1), Eng-Jon Ong (2), Richard Bowden (2)

(1) School of Computing Sciences, University of East Anglia, UK
(2) School of Electronics and Physical Sciences, University of Surrey, UK

For automatic lipreading, there are many competing methods for feature extraction. Often, because of the complexity of the task these methods are tested on only quite restricted datasets, such as the letters of the alphabet or digits, and from only a few speakers. In this paper we compare some of the leading methods for lip feature extraction and compare them on the GRID dataset which uses a constrained vocabulary over, in this case, 15 speakers. Previously the GRID data has had restricted attention because of the requirements to track the face and lips accurately. We overcome this via the use of a novel linear predictor (LP) tracker which we use to control an Active Appearance Model (AAM).
   By ignoring shape and/or appearance parameters from the AAM we can quantify the effect of appearance and/or shape when lip-reading. We find that shape alone is a useful cue for lipreading (which is consistent with human experiments). However, the incremental effect of shape on appearance appears to be not significant which implies that the inner appearance of the mouth contains more information than the shape.

Index Terms: lip-reading, feature extraction, feature comparison, tracking

Full Paper

Bibliographic reference.  Lan, Yuxuan / Harvey, Richard / Theobald, Barry-John / Ong, Eng-Jon / Bowden, Richard (2009): "Comparing visual features for lipreading", In AVSP-2009, 102-106.