ISCA Archive SSW 2023
ISCA Archive SSW 2023

Re-examining the quality dimensions of synthetic speech

Fritz Seebauer, Michael Kuhlmann, Reinhold Haeb-Umbach, Petra Wagner

The aim of this paper is to generate a more comprehensiveframework for evaluating synthetic speech. To this end, a lineof tests resulting in an exploratory factor analysis (EFA) havebeen carried out. The proposed dimensions that encapsulate theconstruct of “synthetic speech quality” are: “human-likeness”,“audio quality”, “negative emotion”, “dominance”, “positiveemotion”, “calmness”, “seniority” and “gender”, with item-to-total correlations pointing towards “gender” being an orthogonal construct. A subsequent analysis on common acoustic features, found in forensic and phonetic literature, reveals veryweak correlations with the proposed scales. Inter-rater andinter-item agreement measures additionally reveal low consistency within the scales. We also make the case that there is aneed for a more fine grained approach when investigating thequality of synthetic speech systems, and propose a method thatattempts to capture individual quality dimensions in the timedomain.

doi: 10.21437/SSW.2023-6

Cite as: Seebauer, F., Kuhlmann, M., Haeb-Umbach, R., Wagner, P. (2023) Re-examining the quality dimensions of synthetic speech. Proc. 12th ISCA Speech Synthesis Workshop (SSW2023), 34-40, doi: 10.21437/SSW.2023-6

  author={Fritz Seebauer and Michael Kuhlmann and Reinhold Haeb-Umbach and Petra Wagner},
  title={{Re-examining the quality dimensions of synthetic speech}},
  booktitle={Proc. 12th ISCA Speech Synthesis Workshop (SSW2023)},