We describe a corpus of childrens speech, called the OGI Kids Speech corpus, and a speaker- and vocabularyindependent recognition system trained and evaluated with these data. The corpus is composed of both prompted and spontaneous speech from 1100 children from kindergarten through grade 10. The prompted speech was presented as text appearing below an animated character (Baldi) that produced accurate visible speech synchronized with recorded prompts. The speech and text consists of isolated words, sentences, and digit strings. A phonetic recognizer was trained using an HMM/ANN framework, with training data taken from intervals of speech associated with phonetic segments in the isolated words in the corpus. Phonetic segments were derived using automatic phonetic alignment. To find out how well the recognizer is able to generalize to new words not found in the training set, we performed two test-set evaluations: one using a new set of utterances from the set of 205 words spoken in isolation (similar to the data used to train the recognizer) and one using words from the prompted sentences. Results were dramatically different (97.5% for isolated vs. 37.9% for words in sentences), and we explore methods that may be used to improve the recognizers ability to generalize to new words.
Cite as: Shobaki, K., Hosom, J.-P., Cole, R.A. (2000) The OGI kids² speech corpus and recognizers. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 4, 258-261, doi: 10.21437/ICSLP.2000-800
@inproceedings{shobaki00_icslp, author={Khaldoun Shobaki and John-Paul Hosom and Ronald A. Cole}, title={{The OGI kids² speech corpus and recognizers}}, year=2000, booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)}, pages={vol. 4, 258-261}, doi={10.21437/ICSLP.2000-800} }