10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

A Language-Independent Feature Set for the Automatic Evaluation of Prosody

Andreas Maier, F. Hönig, V. Zeissler, Anton Batliner, E. Körner, N. Yamanaka, P. Ackermann, Elmar Nöth

FAU Erlangen-Nürnberg, Germany

In second language learning, the correct use of prosody plays a vital role. Therefore, an automatic method to evaluate the naturalness of the prosody of a speaker is desirable. We present a novel method to model prosody independently of the text and thus independently of the language as well. For this purpose, the voiced and unvoiced speech segments are extracted and a 187-dimensional feature vector is computed for each voiced segment. This approach is compared to word based prosodic features on a German text passage. Both are confronted with the perceptive evaluation of two native speakers of German. The word-based feature set yielded correlations of up to 0.92 while the text-independent feature set yielded 0.88. This is in the same range as the inter-rater correlation with 0.88. Furthermore, the text-independent features were computed for a Japanese translation of the passage which was also rated by two native speakers of Japanese. Again, the correlation between the automatic system and the human perception of the naturalness was high with 0.83 and not significantly lower than the inter-rater correlation of 0.92.

