Auditory-Visual Speech Processing 2007 (AVSP2007)

Kasteel Groenendaal, Hilvarenbeek, The Netherlands
August 31 - September 3, 2007

Effects of Intermodal Timing Difference and Speed Difference on Intelligibility of Auditory-Visual Speech in Younger and Older Adults

Akihiro Tanaka (1), Shuichi Sakamoto (2), Komi Tsumura (2), Y˘iti Suzuki (2)

(1) Department of Psychology, University of Tokyo, Japan
(2) Research Institute of Electrical Communication, Tohoku University, Japan

Previous studies have revealed a temporal window during which human observers perceive physically desynchronized auditory and visual signals as synchronous. This study investigated effects of intermodal timing differences and speed differences on intelligibility of auditory-visual speech. We used 20 minimal pairs of Japanese four-mora words such as "mi-zu-a-ge" (catch landing) versus "mi-zu-a-me" (starch syrup) and administered intelligibility tests. Words were presented under visual-only, auditory-only, and auditoryvisual (AV) conditions. Two types of AV conditions were used: asynchronous and expansion conditions. In asynchronous (i.e. timing difference) conditions, the audio lag was 0-400 ms. In expansion (i.e. speed difference) conditions, the auditory signal was time-expanded while the visual signal was kept at the original speed. The amount of expansion was 0-400 ms. Results showed that the word intelligibility declined as the timing difference and speed difference increased. Results of AV benefit (i.e. the superiority of AV performance over auditory-only performance) revealed that the AV benefit at the end of words declined as the speed difference increased, although it did not decline as timing difference increased. These results suggest that intermodal lag recalibration requires a constant timing difference between auditory and visual signals. Older adults recalibrated neither the timing difference nor the speed difference. These results might be useful for design of a multimodal speech-rate conversion system.

