EUROSPEECH 2001 Scandinavia
This paper is concerned with automatic recognition of children’s speech. The paper begins with a comparison of vowel formant frequencies for adult and children’s speech, and notes that in many cases, the average value of F3 for children is greater than 4kHz. Next it is shown that recognition accuracy for children’s speech degrades rapidly as bandwidth is reduced to less that 6kHz. Finally, it is demonstrated that the choice of front-end signal processing parameters such as analysis window length, and mel-scale filter widths, have little effect on recognition accuracy for children’s speech. It is concluded that bandwidth reduction is a major contributor to the difficulty of recognition of children’s speech.
Bibliographic reference. Li, Qun / Russell, Martin J. (2001): "Why is automatic recognition of children's speech difficult?", In EUROSPEECH-2001, 2671-2674.