8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


Spectro-Temporal Interactions in Auditory and Auditory-Visual Speech Processing

Ken W. Grant (1), Steven Greenberg (2)

(1) Walter Reed Army Medical Center, USA
(2) The Speech Institute, USA

Speech recognition often involves the face-to-face communication between two or more individuals. The combined influences of auditory and visual speech information leads to a remarkably robust signal that is greatly resistant to noise, reverberation, hearing loss, and other forms of signal distortion. Studies of auditory-visual speech processing have revealed that speechreading interacts with audition in both the spectral and temporal domain. For example, not all speech frequencies are equal in their ability to supplement speechreading, with low-frequency speech cues providing more benefit than high-frequency speech cues. Additionally, in contrast to auditory speech processing which integrates information across frequency over relatively short time windows (20- 40 ms), auditory-visual speech processing appears to use relatively long time windows of integration (roughly 250 ms). In this paper, some of the basic spectral and temporal interactions between auditory and visual speech channels are enumerated and discussed.

