Auditory-Visual Speech Processing 2005

British Columbia, Canada
July 24-27, 2005

A Visual Concomitant of the Lombard Reflex

Jeesun Kim (1), Chris Davis (1), Guillaume Vignali (1), Harold Hill (2)

(1) Department of Psychology, The University of Melbourne, Melbourne, Australia
(2) Department of Vision Dynamics, ATR, Kyoto, Japan

The aim of the study was to examine how visual speech (speech related head and face movement) might vary as a function of communicative environment. To this end, measurements of head and face movement were recorded for four talkers who uttered the same ten sentences in quiet and four types of background noise condition (Babble and White noise presented through ear plugs or loud speakers). These sentences were also spoken in a whisper. Changes between the normal and in-noise conditions were apparent in many of the Principal Components (PCs) of head and face movement. To simplify the analysis of differences between conditions only the first six movement PCs were considered. The strength and composition of the changes was variable. Large changes occurred for jaw and mouth movement, face expansion and contraction and head rotation in the Z axis. Minimal change occurred for PC3 (rigid head translation in the Z axis). Whispered speech showed many of the characteristics of speech produced in noise but was distinguished by a marked increase in head translation in the Z axis. Analyses of the correlation between auditory speech intensity and movement under the different production conditions also revealed a complex pattern of changes. The correlations between RMS speech energy and the PCs that involved jaw and mouth movement (PC1 and 2) increased markedly from the normal to in-noise production conditions. An increase in the RMS and movement correlation also occurred for head Z-rotation as a function of speaking condition. No increases were observed for the movement associated with head Z-translation, lip protrusion or mouth opening with face contraction. These findings suggest that the relationships underlying Audio-Visual speech perception may differ depending on how that speech was produced.

