As part of a long term project to develop speech recognitions systems for young computer users, specifically children aged between 6 and 11 years, this paper presents a preliminary investigation into the classification of children's vowels. In earlier studies of adult speech we found that dynamic or time-varying cues were useful in classifying diphthongal vowels but provided no advantage for monophthongs if duration is included as an additional cue. In this study we investigate whether dynamic cues (modelled by Discrete Cosine Transform coefficients) are present to a greater or lesser extent in children's vowels. Our hypothesis is that some of the observed variability in children's vowels may be due to systematic time-varying features. We found that the children's monophthong data was better separated by a combination of DCT coefficients and vowel duration than by the formant data sampled at the vowel midpoint plus duration. This result contrasts with our finding on Australian adult data in which we found it was necessary to model the formant trajectory only to separate the diphthongs.
Cite as: Cassidy, S., Watson, C. (1998) Dynamic features in children's vowels. Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998), paper 0664, doi: 10.21437/ICSLP.1998-513
@inproceedings{cassidy98_icslp, author={Steve Cassidy and Catherine Watson}, title={{Dynamic features in children's vowels}}, year=1998, booktitle={Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998)}, pages={paper 0664}, doi={10.21437/ICSLP.1998-513} }