INTERSPEECH 2004 - ICSLP
Human-computer interaction for in-vehicle information and navigation systems is a challenging problem because of the diverse and changing acoustic environments. It is proposed that the integration of video and audio information can significantly improve dialog system performance, since the visual modality is not impacted by acoustic noise. In this paper, we propose a robust audio-visual integration system for source tracking and speech enhancement for an in-vehicle speech dialog system. The proposed system integrates both audio and visual information to locate the desired speaker source. Using real data collected in car environments, the proposed system can improve speech accuracy by up to 40.75% compared with audio data alone.
Bibliographic reference. Zhang, Xianxian / Takeda, Kazuya / Hansen, John H. L. / Maeno, Toshiki (2004): "Audio-visual SPeaker localization for car navigation systems", In INTERSPEECH-2004, 2501-2504.