Auditory-Visual Speech Processing (AVSP) 2009
University of East Anglia, Norwich, UK
For automatic lipreading, there are many competing methods for
feature extraction. Often, because of the complexity of the task
these methods are tested on only quite restricted datasets, such as
the letters of the alphabet or digits, and from only a few speakers.
In this paper we compare some of the leading methods for lip
feature extraction and compare them on the GRID dataset which
uses a constrained vocabulary over, in this case, 15 speakers. Previously
the GRID data has had restricted attention because of the
requirements to track the face and lips accurately. We overcome
this via the use of a novel linear predictor (LP) tracker which we
use to control an Active Appearance Model (AAM).
By ignoring shape and/or appearance parameters from the AAM we can quantify the effect of appearance and/or shape when lip-reading. We find that shape alone is a useful cue for lipreading (which is consistent with human experiments). However, the incremental effect of shape on appearance appears to be not significant which implies that the inner appearance of the mouth contains more information than the shape.
Index Terms: lip-reading, feature extraction, feature comparison, tracking
Bibliographic reference. Lan, Yuxuan / Harvey, Richard / Theobald, Barry-John / Ong, Eng-Jon / Bowden, Richard (2009): "Comparing visual features for lipreading", In AVSP-2009, 102-106.