8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

AVICAR: Audio-Visual Speech Corpus in a Car Environment

Bowon Lee (1), Mark Hasegawa-Johnson (1), Camille Goudeseune (2), Suketu Kamdar (1), Sarah Borys (1), Ming Liu (1), Thomas Huang (1)

(1) University of Illinois at Urbana-Champaign, USA
(2) Beckman Institute, USA

We describe a large audio-visual speech corpus recorded in a car environment, as well as the equipment and procedures used to build this corpus. Data are collected through a multi-sensory array consisting of eight microphones on the sun visor and four video cameras on the dashboard. The script for the corpus consists of four categories: isolated digits, isolated letters, phone numbers, and sentences, all in English. Speakers from various language backgrounds are included, 50 male and 50 female. In order to vary the signal-to-noise ratio, each script has five different noise conditions: idling, driving at 35 mph with windows open and closed, and driving at 55 mph with windows open and closed. The corpus is available through .

Full Paper

Bibliographic reference.  Lee, Bowon / Hasegawa-Johnson, Mark / Goudeseune, Camille / Kamdar, Suketu / Borys, Sarah / Liu, Ming / Huang, Thomas (2004): "AVICAR: audio-visual speech corpus in a car environment", In INTERSPEECH-2004, 2489-2492.