Articulatory Phonology’s link between cognitive speech planning and the physical realizations of vocal tract constrictions has implications for speech acoustic and duration modeling that should be useful in assigning subjective ratings of pronunciation quality to nonnative speech. In this work, we compare traditional phoneme models used in automatic speech recognition to similar models for articulatory gestural pattern vectors, each with associated duration models. What we find is that, on the CDT corpus, gestural models outperform the phoneme-level baseline in terms of correlation with listener ratings, and in combination phoneme and gestural models outperform either one alone. This also validates previous findings with a similar (but not gesture-based) pseudo-articulatory representation.
Bibliographic reference. Tepperman, Joseph / Goldstein, Louis / Lee, Sungbok / Narayanan, Shrikanth S. (2009): "Automatically rating pronunciation through articulatory phonology", In INTERSPEECH-2009, 2771-2774.