Corpora Design and Score Calibration for Text Dependent Pronunciation Proficiency Recognition

Fred Richardson, John Steinberg, Gordon Vidaver, Steve Feinstein, Ray Budd, Jennifer Melot, Paul Gatewood, Douglas Jones


This work investigates methods for improving a pronunciation proficiency recognition system, both in terms of phonetic level posterior probability calibration, and in ordinal utterance level classification, for Modern Standard Arabic (MSA), Spanish and Russian. To support this work, utterance level labels were obtained by crowd-sourcing the annotation of language learners' recordings. Phonetic posterior probability estimates extracted using automatic speech recognition systems trained in each language were estimated using a beta calibration approach [1] and language proficiency level was estimated using an ordinal regression [2]. Fusion with language recognition (LR) scores from an i-vector system [3] trained on 23 languages is also explored. Initial results were promising for all three languages and it was demonstrated that the calibrated posteriors were effective for predicting pronunciation proficiency. Significant relative gains of 16% mean absolute error for the ordinal regression and 17% normalized cross entropy for the binary beta regression were achieved on MSA through fusion with LR scores.


 DOI: 10.21437/SLaTE.2019-12

Cite as: Richardson, F., Steinberg, J., Vidaver, G., Feinstein, S., Budd, R., Melot, J., Gatewood, P., Jones, D. (2019) Corpora Design and Score Calibration for Text Dependent Pronunciation Proficiency Recognition. Proc. SLaTE 2019: 8th ISCA Workshop on Speech and Language Technology in Education, 64-68, DOI: 10.21437/SLaTE.2019-12.


@inproceedings{Richardson2019,
  author={Fred Richardson and John Steinberg and Gordon Vidaver and Steve Feinstein and Ray Budd and Jennifer Melot and Paul Gatewood and Douglas Jones},
  title={{Corpora Design and Score Calibration for Text Dependent Pronunciation Proficiency Recognition}},
  year=2019,
  booktitle={Proc. SLaTE 2019: 8th ISCA Workshop on Speech and Language Technology in Education},
  pages={64--68},
  doi={10.21437/SLaTE.2019-12},
  url={http://dx.doi.org/10.21437/SLaTE.2019-12}
}