Modern text-to-speech (TTS) models are typically subjectivelyevaluated using an Absolute Category Rating (ACR) method.This method uses the mean opinion score to rate each modelunder test. However, if the models are perceptually too similar,assigning absolute ratings to stimuli might be difficult and proneto subjective preference errors. Pairwise comparison tests offerrelative comparison and capture some of the subtle differencesbetween the stimuli better. However, pairwise comparisons takemore time as the number of tests increases exponentially withthe number of models. Alternatively, a ranking-by-elimination(RBE) test can assess multiple models with similar benefits aspairwise comparisons for subtle differences across models without the time penalty. We compared the ACR and RBE tests forTTS evaluation in a controlled experiment. We found that theobtained results were statistically similar even in the presenceof perceptually close TTS models.
Cite as: Kayyar, K., Dittmar, C., Pia, N., Habets, E. (2023) Subjective Evaluation of Text-to-Speech Models: Comparing Absolute Category Rating and Ranking by Elimination Tests. Proc. 12th ISCA Speech Synthesis Workshop (SSW2023), 191-196, doi: 10.21437/SSW.2023-30
@inproceedings{kayyar23_ssw, author={Kishor Kayyar and Christian Dittmar and Nicola Pia and Emanuel Habets}, title={{Subjective Evaluation of Text-to-Speech Models: Comparing Absolute Category Rating and Ranking by Elimination Tests}}, year=2023, booktitle={Proc. 12th ISCA Speech Synthesis Workshop (SSW2023)}, pages={191--196}, doi={10.21437/SSW.2023-30} }