AVSP 2003 - International Conference on Audio-Visual Speech Processing

September 4-7, 2003
St. Jorioz, France

Czech Audio-Visual Speech Corpus of a Car Driver for In-Vehicle Audio-Visual Speech Recognition

Milos Zelezný, Petr Císar

Department of Cybernetics, University of West Bohemia in Pilsen, Czech Republic

This paper presents the design of an audio-visual speech corpus for in-vehicle audio-visual speech recognition. Throughout the world, there exist several audio-visual speech corpora. There are also several (audio-only) speech corpora for in-vehicle recognition. So far, we have not found an audiovisual speech corpus for in-vehicle speech recognition. And, we have not found any audio-visual speech corpora for the Czech language either. Since our aim is to design an audio-visual speech recognizer for in-vehicle recognition, the first thing we had to do was to design, collect, and process the Czech invehicle audio-visual speech corpora.

The purpose of in-vehicle speech recognition is usually its utilization for command control of car features, which does not involve driver's hands. Thus, in real deployment, it will be the driver, whose speech will be recognized. Although it is more demanding than to collect the speech of a passenger, we decided to collect the driver's speech for training purposes. This is probably not so important for audio-only speech corpus, but for our purpose we need to collect speech in real conditions, i.e. conditions that include head movements caused by the fact that the driver has to pay attention to the traffic situation.

