ISCA Archive Interspeech 2017
ISCA Archive Interspeech 2017

Misperceptions of the Emotional Content of Natural and Vocoded Speech in a Car

Jaime Lorenzo-Trueba, Cassia Valentini Botinhao, Gustav Eje Henter, Junichi Yamagishi

This paper analyzes a) how often listeners interpret the emotional content of an utterance incorrectly when listening to vocoded or natural speech in adverse conditions; b) which noise conditions cause the most misperceptions; and c) which group of listeners misinterpret emotions the most. The long-term goal is to construct new emotional speech synthesizers that adapt to the environment and to the listener. We performed a large-scale listening test where over 400 listeners between the ages of 21 and 72 assessed natural and vocoded acted emotional speech stimuli. The stimuli had been artificially degraded using a room impulse response recorded in a car and various in-car noise types recorded in a real car. Experimental results show that the recognition rates for emotions and perceived emotional strength degrade as signal-to-noise ratio decreases. Interestingly, misperceptions seem to be more pronounced for negative and low-arousal emotions such as calmness or anger, while positive emotions such as happiness appear to be more robust to noise. An ANOVA analysis of listener meta-data further revealed that gender and age also influenced results, with elderly male listeners most likely to incorrectly identify emotions.


doi: 10.21437/Interspeech.2017-532

Cite as: Lorenzo-Trueba, J., Botinhao, C.V., Henter, G.E., Yamagishi, J. (2017) Misperceptions of the Emotional Content of Natural and Vocoded Speech in a Car. Proc. Interspeech 2017, 606-610, doi: 10.21437/Interspeech.2017-532

@inproceedings{lorenzotrueba17_interspeech,
  author={Jaime Lorenzo-Trueba and Cassia Valentini Botinhao and Gustav Eje Henter and Junichi Yamagishi},
  title={{Misperceptions of the Emotional Content of Natural and Vocoded Speech in a Car}},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={606--610},
  doi={10.21437/Interspeech.2017-532}
}