This paper describes the continued development of a system to provide early assessment of speech development issues in children and better triaging to professional services. Whilst corpora of children’s speech are increasingly available, recognition of disordered children’s speech is still a data-scarce task. Transfer learning methods have been shown to be effective at leveraging out-of-domain data to improve ASR performance in similar data-scarce applications. This paper combines transfer learning, with previously developed methods for constrained decoding based on expert speech pathology knowledge and knowledge of the target text. Results of this study show that transfer learning with out-of-domain adult speech can improve phoneme recognition for disordered children’s speech. Specifically, a Deep Neural Network (DNN) trained on adult speech and fine-tuned on a corpus of disordered children’s speech reduced the phoneme error rate (PER) of a DNN trained on a children’s corpus from 16.3% to 14.2%. Furthermore, this fine-tuned DNN also improved the performance of a Hierarchal Neural Network based acoustic model previously used by the system with a PER of 19.3%. We close with a discussion of our planned future developments of the system.
Cite as: Smith, D., Sneddon, A., Ward, L., Duenser, A., Freyne, J., Silvera-Tawil, D., Morgan, A. (2017) Improving Child Speech Disorder Assessment by Incorporating Out-of-Domain Adult Speech. Proc. Interspeech 2017, 2690-2694, doi: 10.21437/Interspeech.2017-455
@inproceedings{smith17_interspeech, author={Daniel Smith and Alex Sneddon and Lauren Ward and Andreas Duenser and Jill Freyne and David Silvera-Tawil and Angela Morgan}, title={{Improving Child Speech Disorder Assessment by Incorporating Out-of-Domain Adult Speech}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={2690--2694}, doi={10.21437/Interspeech.2017-455} }