Sixth International Conference on Spoken Language Processing
We describe a corpus of children’s speech, called the OGI Kids’ Speech corpus, and a speaker- and vocabularyindependent recognition system trained and evaluated with these data. The corpus is composed of both prompted and spontaneous speech from 1100 children from kindergarten through grade 10. The prompted speech was presented as text appearing below an animated character (Baldi) that produced accurate visible speech synchronized with recorded prompts. The speech and text consists of isolated words, sentences, and digit strings. A phonetic recognizer was trained using an HMM/ANN framework, with training data taken from intervals of speech associated with phonetic segments in the isolated words in the corpus. Phonetic segments were derived using automatic phonetic alignment. To find out how well the recognizer is able to generalize to new words not found in the training set, we performed two test-set evaluations: one using a new set of utterances from the set of 205 words spoken in isolation (similar to the data used to train the recognizer) and one using words from the prompted sentences. Results were dramatically different (97.5% for isolated vs. 37.9% for words in sentences), and we explore methods that may be used to improve the recognizer’s ability to generalize to new words.
Bibliographic reference. Shobaki, Khaldoun / Hosom, John-Paul / Cole, Ronald A. (2000): "The OGI kids˛ speech corpus and recognizers", In ICSLP-2000, vol.4, 258-261.