11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Comparison of Approaches for Instrumentally Predicting the Quality of Text-to-Speech Systems

Sebastian Möller (1), Florian Hinterleitner (1), Tiago H. Falk (2), Tim Polzehl (1)

(1) Deutsche Telekom Laboratories, Germany
(2) Bloorview Research Institute, Canada

In this paper, we compare and combine different ap-proaches for instrumentally predicting the perceived quality of Text-to-Speech systems. First, a log-likelihood is determined by comparing features extracted from the synthesized speech signal with features trained on natural speech. Second, parameters are extracted which capture quality-relevant degradations of the synthesized speech signal. Both approaches are combined and evaluated on three auditory test databases. The results show that auditory quality judgments can in many cases be predicted with a sufficiently high accuracy and reliability, but that there are considerable differences, mainly between male and female speech samples.

Full Paper

Bibliographic reference.  Möller, Sebastian / Hinterleitner, Florian / Falk, Tiago H. / Polzehl, Tim (2010): "Comparison of approaches for instrumentally predicting the quality of text-to-speech systems", In INTERSPEECH-2010, 1325-1328.