INTERSPEECH 2015
16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

Automated Evaluation of Non-Native English Pronunciation Quality: Combining Knowledge- and Data-Driven Features at Multiple Time Scales

Matthew P. Black, Daniel Bone, Zisis Iason Skordilis, Rahul Gupta, Wei Xia, Pavlos Papadopoulos, Sandeep Nallan Chakravarthula, Bo Xiao, Maarten Van Segbroeck, Jangwon Kim, Panayiotis G. Georgiou, Shrikanth S. Narayanan

University of Southern California, USA

Automatically evaluating pronunciation quality of non-native speech has seen tremendous success in both research and commercial settings, with applications in L2 learning. In this paper, submitted for the INTERSPEECH 2015 Degree of Nativeness Sub-Challenge, this problem is posed under a challenging cross-corpora setting using speech data drawn from multiple speakers from a variety of language backgrounds (L1) reading different English sentences. Since the perception of non-nativeness is realized at the segmental and suprasegmental linguistic levels, we explore a number of acoustic cues at multiple time scales. We experiment with both data-driven and knowledge-inspired features that capture degree of nativeness from pauses in speech, speaking rate, rhythm/stress, and goodness of phone pronunciation. One promising finding is that highly accurate automated assessment can be attained using a small diverse set of intuitive and interpretable features. Performance is further boosted by smoothing scores across utterances from the same speaker; our best system significantly outperforms the challenge baseline.

Full Paper

Bibliographic reference.  Black, Matthew P. / Bone, Daniel / Skordilis, Zisis Iason / Gupta, Rahul / Xia, Wei / Papadopoulos, Pavlos / Chakravarthula, Sandeep Nallan / Xiao, Bo / Segbroeck, Maarten Van / Kim, Jangwon / Georgiou, Panayiotis G. / Narayanan, Shrikanth S. (2015): "Automated evaluation of non-native English pronunciation quality: combining knowledge- and data-driven features at multiple time scales", In INTERSPEECH-2015, 493-497.