The UWB_S01 corpus is a read-speech corpus that is intended to be used mainly for training of Czech continuous speech recognition systems. It has been developed at the Department of Cybernetics at the University of West Bohemia in Pilsen since 1998. This paper describes the structure of the corpus and deals with all necessary steps of the corpus construction: the preprocessing of the texts, the selection of proper sentences that will form the corpus, and the recording and the annotation of the utterances.
Cite as: Radová, V., Psutka, J. (2000) UWB_S01 corpus - a czech read-speech corpus. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 4, 732-735, doi: 10.21437/ICSLP.2000-916
@inproceedings{radova00_icslp, author={Vlasta Radová and Josef Psutka}, title={{UWB_S01 corpus - a czech read-speech corpus}}, year=2000, booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)}, pages={vol. 4, 732-735}, doi={10.21437/ICSLP.2000-916} }