ISCA Archive SSW 2021
ISCA Archive SSW 2021

Identifying the vocal cues of likeability, friendliness and skilfulness in synthetic speech

Sai Sirisha Rallabandi, Babak Naderi, Sebastian Möller

The advent of neural Text-to-Speech (TTS) synthesizers has enhanced the expressivity of synthetic speech in the recent past. However, there is very little work on understanding the acoustic correlates of paralinguistic traits, emotions, speaker attributes and characteristics from synthetic speech. This paper investigates the acoustic correlates of the speaker attributes: likeability, friendliness, and skillfulness. Our study was carried out on the voices derived from the two commercial TTS systems, Amazon Polly (9 voices) and Google TTS engine (10 voices). In our previous study, we performed a crowd-sourcing-based evaluation to collect the subjective ratings for the desired speaker attributes. In this work, we perform the acoustic feature prediction using the backward elimination algorithm. We show that the level of loudness, spectral flux, fundamental frequency, its formant frequencies, and their combinations contribute to the desired speaker attributes. We further combine the ratings of friendliness and likeability to investigate the characteristic, warmth in synthetic speech and correspondingly, skilfullness represents the characteristic, competence.


doi: 10.21437/SSW.2021-1

Cite as: Rallabandi, S.S., Naderi, B., Möller, S. (2021) Identifying the vocal cues of likeability, friendliness and skilfulness in synthetic speech. Proc. 11th ISCA Speech Synthesis Workshop (SSW 11), 1-6, doi: 10.21437/SSW.2021-1

@inproceedings{rallabandi21_ssw,
  author={Sai Sirisha Rallabandi and Babak Naderi and Sebastian Möller},
  title={{Identifying the vocal cues of likeability, friendliness and skilfulness in synthetic speech}},
  year=2021,
  booktitle={Proc. 11th ISCA Speech Synthesis Workshop (SSW 11)},
  pages={1--6},
  doi={10.21437/SSW.2021-1}
}