Auditory-Visual Speech Processing 2007 (AVSP2007)
Kasteel Groenendaal, Hilvarenbeek, The Netherlands
In recent work, we have concentrated on the problem of lipreading from non-frontal views (poses). In particular, we have focused on the use of profile views, and proposed two approaches for lipreading on basis of visual features extracted from such views: (a) Direct statistical modeling of the features, namely use of view-dependent statistical models; and (b) Normalization of such features by their projection onto the "space" of frontal-view visual features, which allows employing one set of statistical models for all available views. The latter approach has been considered for two only poses (frontal and profile views), and for visual features of a specific dimensionality. In this paper, we further extend this work, by investigating its applicability to the case where data from three views are available (frontal, left- and right-profile). In addition, we examine the effect of visual feature dimensionality on the pose-normalization approach. Our experiments demonstrate that results generalize well to three views, but also that feature dimensionality is crucial to the effectiveness of the approach. In particular, feature dimensionality larger than 30 is detrimental to multi-pose visual speech recognition performance.
Bibliographic reference. Lucey, Patrick / Potamianos, Gerasimos / Sridharan, Sridha (2007): "An extended pose-invariant lipreading system", In AVSP-2007, paper L2-2.