Eighth ISCA Workshop on Speech Synthesis

Barcelona, Catalonia, Spain
August 31-September 2, 2013

Is Intelligibility Still the Main Problem? A Review of Perceptual Quality Dimensions of Synthetic Speech

Florian Hinterleitner (1), Christoph Norrenbrock (2), Sebastian Möller (1)

(1) TU Berlin, Germany; (2) CAU Kiel, Germany

In this paper, we present a comparative overview of 9 studies on perceptual quality dimensions of synthetic speech. Different subjective assessment techniques have been used to evaluate the text-to-speech (TTS) stimuli in each of these tests: in a semantic differential, the test participants rate every stimulus on a given set of rating scales, while in a paired comparison test, the subjects rate the similarity of pairs of stimuli. Perceptual quality dimensions can be derived from the results of both test methods, either by performing a factor analysis or via multidimensional scaling. We show that even though the 9 tests differ in terms of used synthesizer types, stimulus duration, language, and quality assessment methods, the resulting perceptual quality dimensions can be linked to 5 universal quality dimensions of synthetic speech: (i) naturalness of voice, (ii) prosodic quality, (iii) fluency and intelligibility, (iv) disturbances, and (v) calmness. Index Terms: text-to-speech (TTS), perceptual quality dimensions, evaluation

Full Paper

Bibliographic reference.  Hinterleitner, Florian / Norrenbrock, Christoph / Möller, Sebastian (2013): "Is intelligibility still the main problem? a review of perceptual quality dimensions of synthetic speech", In SSW8, 147-151.