We investigated the use of context-dependent deep neural network hidden Markov models, or CD-DNN-HMMs, to improve speech recognition performance for a better assessment of children English language learners (ELLs). The ELL data used in the present study was obtained from a large language assessment project administered in schools in a U.S. state. Our DNN-based speech recognition system, built using rectified linear units (ReLU), greatly outperformed recognition accuracy of Gaussian mixture models (GMM)-HMMs, even when the latter models were trained with eight times more data. Large improvement was observed for cases of noisy and/or unclear responses, which are common in ELL children speech. We further explored the use of content and manner-of-speaking features, derived from the speech recognizer output, for estimating spoken English proficiency levels. Experimental results show that the DNN-based recognition approach achieved 31% relative WER reduction when compared to GMM-HMMs. This further improved the quality of the extracted features and final spoken English proficiency scores, and increased overall automatic assessment performance to the human performance level, for various open-ended spoken language tasks.
Bibliographic reference. Metallinou, Angeliki / Cheng, Jian (2014): "Using deep neural networks to improve proficiency assessment for children English language learners", In INTERSPEECH-2014, 1468-1472.