Bidirectional LSTM-RNN for Improving Automated Assessment of Non-Native Children’s Speech

Yao Qian, Keelan Evanini, Xinhao Wang, Chong Min Lee, Matthew Mulholland


Recent advances in ASR and spoken language processing have led to improved systems for automated assessment for spoken language. However, it is still challenging for automated scoring systems to achieve high performance in terms of the agreement with human experts when applied to non-native children’s spontaneous speech. The subpar performance is mainly caused by the relatively low recognition rate on non-native children’s speech. In this paper, we investigate different neural network architectures for improving non-native children’s speech recognition and the impact of the features extracted from the corresponding ASR output on the automated assessment of speaking proficiency. Experimental results show that bidirectional LSTM-RNN can outperform feed-forward DNN in ASR, with an overall relative WER reduction of 13.4%. The improved speech recognition can then boost the language proficiency assessment performance. Correlations between the rounded automated scores and expert scores range from 0.66 to 0.70 for the three speaking tasks studied, similar to the human-human agreement levels for these tasks.


 DOI: 10.21437/Interspeech.2017-250

Cite as: Qian, Y., Evanini, K., Wang, X., Lee, C.M., Mulholland, M. (2017) Bidirectional LSTM-RNN for Improving Automated Assessment of Non-Native Children’s Speech. Proc. Interspeech 2017, 1417-1421, DOI: 10.21437/Interspeech.2017-250.


@inproceedings{Qian2017,
  author={Yao Qian and Keelan Evanini and Xinhao Wang and Chong Min Lee and Matthew Mulholland},
  title={Bidirectional LSTM-RNN for Improving Automated Assessment of Non-Native Children’s Speech},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={1417--1421},
  doi={10.21437/Interspeech.2017-250},
  url={http://dx.doi.org/10.21437/Interspeech.2017-250}
}