Automatically evaluating pronunciation quality of non-native speech has seen tremendous success in both research and commercial settings, with applications in L2 learning. In this paper, submitted for the INTERSPEECH 2015 Degree of Nativeness Sub-Challenge, this problem is posed under a challenging cross-corpora setting using speech data drawn from multiple speakers from a variety of language backgrounds (L1) reading different English sentences. Since the perception of non-nativeness is realized at the segmental and suprasegmental linguistic levels, we explore a number of acoustic cues at multiple time scales. We experiment with both data-driven and knowledge-inspired features that capture degree of nativeness from pauses in speech, speaking rate, rhythm/stress, and goodness of phone pronunciation. One promising finding is that highly accurate automated assessment can be attained using a small diverse set of intuitive and interpretable features. Performance is further boosted by smoothing scores across utterances from the same speaker; our best system significantly outperforms the challenge baseline.
Bibliographic reference. Black, Matthew P. / Bone, Daniel / Skordilis, Zisis Iason / Gupta, Rahul / Xia, Wei / Papadopoulos, Pavlos / Chakravarthula, Sandeep Nallan / Xiao, Bo / Segbroeck, Maarten Van / Kim, Jangwon / Georgiou, Panayiotis G. / Narayanan, Shrikanth S. (2015): "Automated evaluation of non-native English pronunciation quality: combining knowledge- and data-driven features at multiple time scales", In INTERSPEECH-2015, 493-497.