INTERSPEECH 2004 - ICSLP
8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Audio-Visual SPeaker Localization for Car Navigation Systems

Xianxian Zhang (1), Kazuya Takeda (2), John H. L. Hansen (1), Toshiki Maeno (2)

(1) University of Colorado at Boulder, USA
(2) Nagoya University, Japan

Human-computer interaction for in-vehicle information and navigation systems is a challenging problem because of the diverse and changing acoustic environments. It is proposed that the integration of video and audio information can significantly improve dialog system performance, since the visual modality is not impacted by acoustic noise. In this paper, we propose a robust audio-visual integration system for source tracking and speech enhancement for an in-vehicle speech dialog system. The proposed system integrates both audio and visual information to locate the desired speaker source. Using real data collected in car environments, the proposed system can improve speech accuracy by up to 40.75% compared with audio data alone.

Full Paper

Bibliographic reference.  Zhang, Xianxian / Takeda, Kazuya / Hansen, John H. L. / Maeno, Toshiki (2004): "Audio-visual SPeaker localization for car navigation systems", In INTERSPEECH-2004, 2501-2504.