ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Cross-linguistic Emotion Perception in Human and TTS Voices

Iona Gessinger, Michelle Cohn, Benjamin R. Cowan, Georgia Zellou, Bernd Möbius

This study investigates how German listeners perceive changes in the emotional expression of German and American English human voices and Amazon Alexa text-to-speech (TTS) voices, respectively. Participants rated sentences containing emotionally neutral lexico-semantic information that were resynthesized to vary in prosodic emotional expressiveness. Starting from an emotionally neutral production, three levels of increasing 'happiness' were created. Results show that 'happiness' manipulations lead to higher ratings of emotional valence (i.e., more positive) and arousal (i.e., more excited) for German and English voices, with stronger effects for the German voices. In particular, changes in valence were perceived more prominently in German TTS compared to English TTS. Additionally, both TTS voices were rated lower than the respective human voices on scales that reflect anthropomorphism (e.g., human-likeness). We discuss these findings in the context of cross-linguistic emotion accounts.

doi: 10.21437/Interspeech.2023-711

Cite as: Gessinger, I., Cohn, M., Cowan, B.R., Zellou, G., Möbius, B. (2023) Cross-linguistic Emotion Perception in Human and TTS Voices. Proc. INTERSPEECH 2023, 5222-5226, doi: 10.21437/Interspeech.2023-711

  author={Iona Gessinger and Michelle Cohn and Benjamin R. Cowan and Georgia Zellou and Bernd Möbius},
  title={{Cross-linguistic Emotion Perception in Human and TTS Voices}},
  booktitle={Proc. INTERSPEECH 2023},