16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

Evidence of Phonological Processes in Automatic Recognition of Children's Speech

Eva Fringi (1), Jill Fain Lehman (2), Martin Russell (1)

(1) University of Birmingham, UK
(2) Disney Research, USA

Automatic speech recognition (ASR) for children's speech is more difficult than for adults' speech. A plausible explanation is that ASR errors are due to predictable phonological effects associated with language acquisition. We describe phone recognition experiments on hand labelled data for children aged between 5 and 9. A comparison of the resulting confusion matrices with those for adult speech (TIMIT) shows increased phone substitution rates for children, which correspond to some extent to established phonological phenomena. However these errors still only account for a relatively small proportion of the issue. This suggests that attempts to improve ASR accuracy on children's speech by accommodating these phenomena, for example by changing the pronunciation dictionary, cannot solve the whole problem.

Full Paper

Bibliographic reference.  Fringi, Eva / Lehman, Jill Fain / Russell, Martin (2015): "Evidence of phonological processes in automatic recognition of children's speech", In INTERSPEECH-2015, 1621-1624.