Auditory-Visual Speech Processing 2007 (AVSP2007)

Kasteel Groenendaal, Hilvarenbeek, The Netherlands
August 31 - September 3, 2007

An Extended Pose-Invariant Lipreading System

Patrick Lucey (1), Gerasimos Potamianos (2), Sridha Sridharan (1)

(1) Speech, Audio, Image and Video Technology Laboratory, Queensland University of Technology, Brisbane, Australia
(2) IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA

In recent work, we have concentrated on the problem of lipreading from non-frontal views (poses). In particular, we have focused on the use of profile views, and proposed two approaches for lipreading on basis of visual features extracted from such views: (a) Direct statistical modeling of the features, namely use of view-dependent statistical models; and (b) Normalization of such features by their projection onto the "space" of frontal-view visual features, which allows employing one set of statistical models for all available views. The latter approach has been considered for two only poses (frontal and profile views), and for visual features of a specific dimensionality. In this paper, we further extend this work, by investigating its applicability to the case where data from three views are available (frontal, left- and right-profile). In addition, we examine the effect of visual feature dimensionality on the pose-normalization approach. Our experiments demonstrate that results generalize well to three views, but also that feature dimensionality is crucial to the effectiveness of the approach. In particular, feature dimensionality larger than 30 is detrimental to multi-pose visual speech recognition performance.

Full Paper

Bibliographic reference.  Lucey, Patrick / Potamianos, Gerasimos / Sridharan, Sridha (2007): "An extended pose-invariant lipreading system", In AVSP-2007, paper L2-2.