Speech and Language Technology in Education (SLaTE 2013)
In this paper, we present preliminary results on applying a comparison-based framework to the task of pronunciation scoring. The comparison-based system works by aligning a student's utterance with a teacher's utterance via dynamic time warping (DTW). Features that describe the degree of mis-alignment are extracted from the aligned path and the distance matrix. We focus on a dataset in Levantine Arabic, a low-resource language for which there is not enough automatic speech recognition (ASR) capability available. Three different speech representations are investigated: MFCCs, Gaussian posteriorgrams, and English phoneme state posteriorgrams decoded on Levantine data. Experimental results show that the system can improve both correlation and mean squared error between machine predicted scores and human ratings compared to a template-based system.
Index Terms: pronunciation scoring, dynamic time warping, posteriorgrams
Bibliographic reference. Lee, Ann / Glass, James (2013): "Pronunciation assessment via a comparison-based system", In SLaTE-2013, 122-126.