Sixth International Conference on Spoken Language Processing
The UWB_S01 corpus is a read-speech corpus that is intended to be used mainly for training of Czech continuous speech recognition systems. It has been developed at the Department of Cybernetics at the University of West Bohemia in Pilsen since 1998. This paper describes the structure of the corpus and deals with all necessary steps of the corpus construction: the preprocessing of the texts, the selection of proper sentences that will form the corpus, and the recording and the annotation of the utterances.
Bibliographic reference. Radová, Vlasta / Psutka, Josef (2000): "UWB_S01 corpus - a czech read-speech corpus", In ICSLP-2000, vol.4, 732-735.