ISCA Archive SSW 2023
ISCA Archive SSW 2023

Situating Speech Synthesis: Investigating Contextual Factors in the Evaluation of Conversational TTS

Harm Lameris, Ambika Kirkland, Joakim Gustafson, Eva Szekely

Speech synthesis evaluation methods have lagged behind thedevelopment of TTS systems, with single sentence read-speechMOS naturalness evaluation on crowdsourcing platforms beingthe industry standard. For TTS to successfully be applied insocial contexts, evaluation methods need to be socially embedded in the situation where they will be deployed. Due to thetime and cost constraints of conducting an in-person interactionevaluation for TTS, we examine the effect of introducing situational context and preceding sentence context to participants ina subjective listening experiment. We conduct a suitability evaluation for a robot game guide that explains game rules to participants using two synthesized spontaneous voices: an instruction-specific and a general spontaneous voice. Results indicate thatthe inclusion of context influences user ratings, highlighting theneed for context-aware evaluations. However, the type of context did not significantly affect the results.


doi: 10.21437/SSW.2023-11

Cite as: Lameris, H., Kirkland, A., Gustafson, J., Szekely, E. (2023) Situating Speech Synthesis: Investigating Contextual Factors in the Evaluation of Conversational TTS. Proc. 12th ISCA Speech Synthesis Workshop (SSW2023), 69-74, doi: 10.21437/SSW.2023-11

@inproceedings{lameris23_ssw,
  author={Harm Lameris and Ambika Kirkland and Joakim Gustafson and Eva Szekely},
  title={{Situating Speech Synthesis: Investigating Contextual Factors in the Evaluation of Conversational TTS}},
  year=2023,
  booktitle={Proc. 12th ISCA Speech Synthesis Workshop (SSW2023)},
  pages={69--74},
  doi={10.21437/SSW.2023-11}
}