ISCA Archive Interspeech 2009

Automatically rating pronunciation through articulatory phonology

Joseph Tepperman, Louis Goldstein, Sungbok Lee, Shrikanth S. Narayanan

Articulatory PhonologyÂ’s link between cognitive speech planning and the physical realizations of vocal tract constrictions has implications for speech acoustic and duration modeling that should be useful in assigning subjective ratings of pronunciation quality to nonnative speech. In this work, we compare traditional phoneme models used in automatic speech recognition to similar models for articulatory gestural pattern vectors, each with associated duration models. What we find is that, on the CDT corpus, gestural models outperform the phoneme-level baseline in terms of correlation with listener ratings, and in combination phoneme and gestural models outperform either one alone. This also validates previous findings with a similar (but not gesture-based) pseudo-articulatory representation.

doi: 10.21437/Interspeech.2009-708

Cite as: Tepperman, J., Goldstein, L., Lee, S., Narayanan, S.S. (2009) Automatically rating pronunciation through articulatory phonology. Proc. Interspeech 2009, 2771-2774, doi: 10.21437/Interspeech.2009-708

