ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

Balanced corpus of informal spoken Czech: compilation, design and findings

Martina Waclawičová, Michal Křen, Lucie Válková

The paper presents ORAL2008, a new 1-million corpus of spoken Czech compiled within the framework of the Czech National Corpus project. ORAL2008 is designed as a representation of authentic spoken language used in informal situations and it is balanced in the main sociolinguistic categories of speakers. The paper concentrates also on the data collection, its broad coverage and the transcription system that registers variability of spoken Czech. Possible findings based on the provided data are finally outlined.


doi: 10.21437/Interspeech.2009-530

Cite as: Waclawičová, M., Křen, M., Válková, L. (2009) Balanced corpus of informal spoken Czech: compilation, design and findings. Proc. Interspeech 2009, 1819-1822, doi: 10.21437/Interspeech.2009-530

@inproceedings{waclawicova09_interspeech,
  author={Martina Waclawičová and Michal Křen and Lucie Válková},
  title={{Balanced corpus of informal spoken Czech: compilation, design and findings}},
  year=2009,
  booktitle={Proc. Interspeech 2009},
  pages={1819--1822},
  doi={10.21437/Interspeech.2009-530}
}