ISCA Archive SSW 2013
ISCA Archive SSW 2013

Is intelligibility still the main problem? a review of perceptual quality dimensions of synthetic speech

Florian Hinterleitner, Christoph Norrenbrock, Sebastian Möller

In this paper, we present a comparative overview of 9 studies on perceptual quality dimensions of synthetic speech. Different subjective assessment techniques have been used to evaluate the text-to-speech (TTS) stimuli in each of these tests: in a semantic differential, the test participants rate every stimulus on a given set of rating scales, while in a paired comparison test, the subjects rate the similarity of pairs of stimuli. Perceptual quality dimensions can be derived from the results of both test methods, either by performing a factor analysis or via multidimensional scaling. We show that even though the 9 tests differ in terms of used synthesizer types, stimulus duration, language, and quality assessment methods, the resulting perceptual quality dimensions can be linked to 5 universal quality dimensions of synthetic speech: (i) naturalness of voice, (ii) prosodic quality, (iii) fluency and intelligibility, (iv) disturbances, and (v) calmness.

Index Terms: text-to-speech (TTS), perceptual quality dimensions, evaluation


Cite as: Hinterleitner, F., Norrenbrock, C., Möller, S. (2013) Is intelligibility still the main problem? a review of perceptual quality dimensions of synthetic speech. Proc. 8th ISCA Workshop on Speech Synthesis (SSW 8), 147-151

@inproceedings{hinterleitner13_ssw,
  author={Florian Hinterleitner and Christoph Norrenbrock and Sebastian Möller},
  title={{Is intelligibility still the main problem? a review of perceptual quality dimensions of synthetic speech}},
  year=2013,
  booktitle={Proc. 8th ISCA Workshop on Speech Synthesis (SSW 8)},
  pages={147--151}
}