Automatic Scoring of Shadowing Speech Based on DNN Posteriors and Their DTW

Junwei Yue, Fumiya Shiozawa, Shohei Toyama, Yutaka Yamauchi, Kayoko Ito, Daisuke Saito, Nobuaki Minematsu


Shadowing has become a well-known method to improve learners’ overall proficiency. Our previous studies realized automatic scoring of shadowing speech using HMM phoneme posteriors, called GOP (Goodness of Pronunciation) and learners’ TOEIC scores were predicted adequately. In this study, we enhance our studies from multiple angles: 1) a much larger amount of shadowing speech is collected, 2) manual scoring of these utterances is done by two native teachers, 3) DNN posteriors are introduced instead of HMM ones, 4) language-independent shadowing assessment based on posteriors-based DTW (Dynamic Time Warping) is examined. Experiments suggest that, compared to HMM, DNN can improve teacher-machine correlation largely by 0.37 and DTW based on DNN posteriors shows as high correlation as 0.74 even when posterior calculation is done using a different language from the target language of learning.


 DOI: 10.21437/Interspeech.2017-728

Cite as: Yue, J., Shiozawa, F., Toyama, S., Yamauchi, Y., Ito, K., Saito, D., Minematsu, N. (2017) Automatic Scoring of Shadowing Speech Based on DNN Posteriors and Their DTW. Proc. Interspeech 2017, 1422-1426, DOI: 10.21437/Interspeech.2017-728.


@inproceedings{Yue2017,
  author={Junwei Yue and Fumiya Shiozawa and Shohei Toyama and Yutaka Yamauchi and Kayoko Ito and Daisuke Saito and Nobuaki Minematsu},
  title={Automatic Scoring of Shadowing Speech Based on DNN Posteriors and Their DTW},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={1422--1426},
  doi={10.21437/Interspeech.2017-728},
  url={http://dx.doi.org/10.21437/Interspeech.2017-728}
}