ISCA Archive SSW 2021
ISCA Archive SSW 2021

Perception of smiling voice in spontaneous speech synthesis

Ambika Kirkland, Marcin Włodarczak, Joakim Gustafson, Eva Szekely

Smiling during speech production has been shown to result in perceptible acoustic differences compared to non-smiling speech. However, there is a scarcity of research on the perception of “smiling voice” in synthesized spontaneous speech. In this study, we used a sequence-to-sequence neural text-tospeech system built on conversational data to produce utterances with the characteristics of spontaneous speech. Segments of speech following laughter, and the same utterances not preceded by laughter, were compared in a perceptual experiment after removing laughter and/or breaths from the beginning of the utterance to determine whether participants perceive the utterances preceded by laughter as sounding as if they were produced while smiling. The results showed that participants identified the post-laughter speech as smiling at a rate significantly greater than chance. Furthermore, the effect of content (positive/neutral/negative) was investigated. These results show that laughter, a spontaneous, non-elicited phenomenon in our model’s training data, can be used to synthesize expressive speech with the perceptual characteristics of smiling.


doi: 10.21437/SSW.2021-19

Cite as: Kirkland, A., Włodarczak, M., Gustafson, J., Szekely, E. (2021) Perception of smiling voice in spontaneous speech synthesis. Proc. 11th ISCA Speech Synthesis Workshop (SSW 11), 108-112, doi: 10.21437/SSW.2021-19

@inproceedings{kirkland21_ssw,
  author={Ambika Kirkland and Marcin Włodarczak and Joakim Gustafson and Eva Szekely},
  title={{Perception of smiling voice in spontaneous speech synthesis}},
  year=2021,
  booktitle={Proc. 11th ISCA Speech Synthesis Workshop (SSW 11)},
  pages={108--112},
  doi={10.21437/SSW.2021-19}
}