8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Design and Recording of Czech Sign Language Corpus for Automatic Sign Language Recognition

Pavel Campr, Marek Hrúz, Miloš Železný

University of West Bohemia in Pilsen, Czech Republic

We describe the design, recording and content of a Czech Sign Language database in this paper. The database is intended for training and testing of sign language recognition (SLR) systems. The UWB-06-SLR-A database contains video data of 15 signers recorded from 3 different views, two of them capture whole body and provide 3D motion data, and third one is focused on signer's face and provide data for face expression feature extraction and for lipreading.

The corpus consists of nearly 5 hours of processed and annotated video files which were recorded in laboratory conditions using static illumination. The whole corpus is annotated and pre-processed to be ready to use in SLR experiments. It is composed of 25 selected signs from Czech Sign Language. Each signer performed all of these signs with 5 repetitions. Altogether the database contains more than 5500 video files where each file contains one isolated sign.

The purpose of the corpus is to provide data for evaluation of visual parameterizations and sign language recognition techniques. The corpus is pre-processed and each video file is supplemented with a XML data file. It provides information about performed sign (name of sign, type of sign), signer (identification, left or right-handed person), scene (camera position, calibration matrices) and pre-processed data (regions of interests, hands and head trajectories in 3D space).

The presented database is collected, preprocessed and is ready to use for subsequent experiments on sign language recognition.

Full Paper
Audio-Visual Example
- Sample of recorded sign from 3 different views - front camera, upper camera, and face camera

