ISCA Archive Interspeech 2006
ISCA Archive Interspeech 2006

Cues for hesitation in speech synthesis

Rolf Carlson, Kjell Gustafson, Eva Strangert

The current study investigates acoustic correlates to perceived hesitation based on previous work showing that pause duration and final lengthening both contribute to the perception of hesitation. It is the total duration increase that is the valid cue rather than the contribution by either factor. The present experiment using speech synthesis was designed to evaluate F0 slope and presence vs. absence of creaky voice before the inserted hesitation in addition to durational cues. The manipulations occurred in two syntactic positions, within a phrase and between two phrases, respectively. The results showed that in addition to durational increase, variation of both F0 slope and creaky voice had perceptual effects, although to a much lesser degree. The results have a bearing on efforts to model spontaneous speech including disfluencies, to be explored, for example, in spoken dialogue systems.


doi: 10.21437/Interspeech.2006-382

Cite as: Carlson, R., Gustafson, K., Strangert, E. (2006) Cues for hesitation in speech synthesis. Proc. Interspeech 2006, paper 1516-Tue3BuP.2, doi: 10.21437/Interspeech.2006-382

@inproceedings{carlson06_interspeech,
  author={Rolf Carlson and Kjell Gustafson and Eva Strangert},
  title={{Cues for hesitation in speech synthesis}},
  year=2006,
  booktitle={Proc. Interspeech 2006},
  pages={paper 1516-Tue3BuP.2},
  doi={10.21437/Interspeech.2006-382}
}