Auditory-Visual Speech Processing (AVSP) 2010
Hakone, Kanagawa, Japan
We describe preliminary work towards an objective method for identifying visemes. Active appearance model (AAM) features are used to parameterise a speakers lips and jaw during speech. The temporal behaviour of AAM features between automatically identified salient points is used to represent visual speech gestures, and visemes are created by clustering these gestures using dynamic time warping (DTW) as a costfunction. This method produces a significantly more structured model of visual speech than if a typical phoneme-to-viseme mapping is assumed.
Index Terms: Visemes, visual speech encoding
Bibliographic reference. Hilder, Sarah / Theobald, Barry-John / Harvey, Richard (2010): "In pursuit of visemes", In AVSP-2010, paper S8-2.