Auditory-Visual Speech Processing 2005

British Columbia, Canada
July 24-27, 2005

Design and Recording of Czech Speech Corpus for Audio-Visual Continuous Speech Recognition

Petr Cisar, Milos Zelezny, Zdenek Krnoul, Jakub Kanis, Jan Zelinka, Ludek Müller

Department of Cybernetics, University of West Bohemia in Pilsen, Czech Republic

In this paper we describe the design, recording, and content of a large audio-visual speech database intended for training and testing of audio-visual continuous speech recognition systems. The UWB- 05-HSCAVC database contains high resolution video and quality audio data suitable for experiments on audio-visual speech recognition.

The corpus consists of nearly 40 hours of audiovisual records of 100 speakers in laboratory conditions. The whole database was collected using static illumination. Recorded subjects were asked to remain static with almost no head movements. The whole corpus is annotated and pre-processed to be ready to use in audio-visual speech recognition experiments.

The purpose of the corpus is to provide data for evaluation of visual speech parameterizations. The corpus pre-processing was designed for use with both image-based and contour-based visual speech parameterizations. The head-tracking is carried out and its output is provided with the database. User thus does not need to find region of interest, since this information is attached to each frame of the database records.

The presented database is collected, annotated, and preprocessed and is ready to use for subsequent experiments on visual speech parameterizations.

