Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

UWB_S01 Corpus - A Czech Read-Speech Corpus

Vlasta Radová, Josef Psutka

University of West Bohemia, Department of Cybernetics, Plzen, Czech Republic

The UWB_S01 corpus is a read-speech corpus that is intended to be used mainly for training of Czech continuous speech recognition systems. It has been developed at the Department of Cybernetics at the University of West Bohemia in Pilsen since 1998. This paper describes the structure of the corpus and deals with all necessary steps of the corpus construction: the preprocessing of the texts, the selection of proper sentences that will form the corpus, and the recording and the annotation of the utterances.

Full Paper

Bibliographic reference.  Radová, Vlasta / Psutka, Josef (2000): "UWB_S01 corpus - a czech read-speech corpus", In ICSLP-2000, vol.4, 732-735.