ISCA Archive SSW 2021
ISCA Archive SSW 2021

Personality in the mix - investigating the contribution of fillers and speaking style to the perception of spontaneous speech synthesis

Joakim Gustafson, Jonas Beskow, Eva Szekely

Studies on human-human interactions have shown that that the fluency of a speaker influences the perception of personality. Adding fillers and discourse markers can make the speaker seem uncertain, more casual and spontaneous. With recent TTS developments it is now possible to investigate if the same holds for artificial speakers. In a previous experiment, it was shown that local insertion of fillers in a regular TTS voice influenced the perceived personality. In the current study we extend that work in two ways: Firstly, we recreate the English experiment adding a voice trained on spontaneous speech, where adding fillers also has a global effect on the synthesized speech. We also add Swedish read and spontaneous voices. Secondly, for the Swedish voices, we investigate the effect of using a multispeaker model mixing a read speech voice and a spontaneous speech voice when generating disfluent synthetic speech.


doi: 10.21437/SSW.2021-9

Cite as: Gustafson, J., Beskow, J., Szekely, E. (2021) Personality in the mix - investigating the contribution of fillers and speaking style to the perception of spontaneous speech synthesis. Proc. 11th ISCA Speech Synthesis Workshop (SSW 11), 48-53, doi: 10.21437/SSW.2021-9

@inproceedings{gustafson21_ssw,
  author={Joakim Gustafson and Jonas Beskow and Eva Szekely},
  title={{Personality in the mix - investigating the contribution of fillers and speaking style to the perception of spontaneous speech synthesis}},
  year=2021,
  booktitle={Proc. 11th ISCA Speech Synthesis Workshop (SSW 11)},
  pages={48--53},
  doi={10.21437/SSW.2021-9}
}