Acoustic models for state-of-the-art DNN-based speech recognition systems are typically trained on over several hundred hours, or even much more of task specific training data. However, it is not always available for some of real applications. In this paper, we investigate how to use an adult speech corpus to improve DNN-based automatic recognition of non-native children speech for spoken assessment applications. Although there exists many acoustic and linguistic mismatches between the speech of an adult and that of a child, adult speech still can boost the performance of speech recognizer for children with the acoustic modeling techniques based on DNN framework. The experimental results show that the best recognition performance is got by combining children training data with adult data in relative same size and initializing DNN by the weights got by pre-training with full training set of adult corpus. It can outperform the baseline system built on only children training data by overall 11.9% of relative WER reduction. The task of picture narration achieves the largest gains among the three tasks, i.e., WER is reduced from 24.6 % to 20.1%.
Cite as: Qian, Y., Wang, X., Evanini, K., Suendermann-Oeft, D. (2016) Improving DNN-Based Automatic Recognition of Non-native Children Speech with Adult Speech. Proc. 5th Workshop on Child Computer Interaction (WOCCI 2016), 40-44, doi: 10.21437/WOCCI.2016-7
@inproceedings{qian16_wocci, author={Yao Qian and Xinhao Wang and Keelan Evanini and David Suendermann-Oeft}, title={{Improving DNN-Based Automatic Recognition of Non-native Children Speech with Adult Speech}}, year=2016, booktitle={Proc. 5th Workshop on Child Computer Interaction (WOCCI 2016)}, pages={40--44}, doi={10.21437/WOCCI.2016-7} }