ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

A language-independent feature set for the automatic evaluation of prosody

Andreas Maier, F. Hönig, V. Zeissler, Anton Batliner, E. Körner, N. Yamanaka, P. Ackermann, Elmar Nöth

In second language learning, the correct use of prosody plays a vital role. Therefore, an automatic method to evaluate the naturalness of the prosody of a speaker is desirable. We present a novel method to model prosody independently of the text and thus independently of the language as well. For this purpose, the voiced and unvoiced speech segments are extracted and a 187-dimensional feature vector is computed for each voiced segment. This approach is compared to word based prosodic features on a German text passage. Both are confronted with the perceptive evaluation of two native speakers of German. The word-based feature set yielded correlations of up to 0.92 while the text-independent feature set yielded 0.88. This is in the same range as the inter-rater correlation with 0.88. Furthermore, the text-independent features were computed for a Japanese translation of the passage which was also rated by two native speakers of Japanese. Again, the correlation between the automatic system and the human perception of the naturalness was high with 0.83 and not significantly lower than the inter-rater correlation of 0.92.

doi: 10.21437/Interspeech.2009-216

Cite as: Maier, A., Hönig, F., Zeissler, V., Batliner, A., Körner, E., Yamanaka, N., Ackermann, P., Nöth, E. (2009) A language-independent feature set for the automatic evaluation of prosody. Proc. Interspeech 2009, 600-603, doi: 10.21437/Interspeech.2009-216

  author={Andreas Maier and F. Hönig and V. Zeissler and Anton Batliner and E. Körner and N. Yamanaka and P. Ackermann and Elmar Nöth},
  title={{A language-independent feature set for the automatic evaluation of prosody}},
  booktitle={Proc. Interspeech 2009},