Prosody can be used to infer whether or not candidates fully understand a passage they are reading aloud. In this paper, we focused on automatic assessment of prosody in a read-aloud section for a high-stakes English test. A new method was proposed to handle fundamental frequency (F0) of unvoiced segments that significantly improved the predictive power of F0. The k-means clustering method was used to build canonical contour models at the word level for F0 and energy. A direct comparison between the candidate's contours and ideal contours gave a strong prediction of the candidate's human prosody rating. Duration information at the phoneme level was an even better predictive feature. When the contours and duration information were combined, the correlation coefficient r = 0.80 was obtained, which exceeded the correlation between human raters (r = 0.75). The results support the use of the new methods for evaluating prosody in high-stakes assessments.
Bibliographic reference. Cheng, Jian (2011): "Automatic assessment of prosody in high-stakes English tests", In INTERSPEECH-2011, 1589-1592.