15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Using Deep Neural Networks to Improve Proficiency Assessment for Children English Language Learners

Angeliki Metallinou, Jian Cheng

Pearson, USA

We investigated the use of context-dependent deep neural network hidden Markov models, or CD-DNN-HMMs, to improve speech recognition performance for a better assessment of children English language learners (ELLs). The ELL data used in the present study was obtained from a large language assessment project administered in schools in a U.S. state. Our DNN-based speech recognition system, built using rectified linear units (ReLU), greatly outperformed recognition accuracy of Gaussian mixture models (GMM)-HMMs, even when the latter models were trained with eight times more data. Large improvement was observed for cases of noisy and/or unclear responses, which are common in ELL children speech. We further explored the use of content and manner-of-speaking features, derived from the speech recognizer output, for estimating spoken English proficiency levels. Experimental results show that the DNN-based recognition approach achieved 31% relative WER reduction when compared to GMM-HMMs. This further improved the quality of the extracted features and final spoken English proficiency scores, and increased overall automatic assessment performance to the human performance level, for various open-ended spoken language tasks.

Full Paper

Bibliographic reference.  Metallinou, Angeliki / Cheng, Jian (2014): "Using deep neural networks to improve proficiency assessment for children English language learners", In INTERSPEECH-2014, 1468-1472.