ISCA Archive SSW 2010
ISCA Archive SSW 2010

Utilising spontaneous conversational speech in HMM-based speech synthesis

Sebastian Andersson, Junichi Yamagishi, Robert A. J. Clark

Spontaneous conversational speech has many characteristics that are currently not well modelled in unit selection and HMM-based speech synthesis. But in order to build synthetic voices more suitable for interaction we need data that exhibits more conversational characteristics than the generally used read aloud sentences. In this paper we will show how carefully selected utterances from a spontaneous conversation was instrumental for building an HMM-based synthetic voices with more natural sounding conversational characteristics than a voice based on carefully read aloud sentences. We also investigated a style blending technique as a solution to the inherent problem of phonetic coverage in spontaneous speech data. But the lack of an appropriate representation of spontaneous speech phenomena probably contributed to results showing that we could not yet compete with the speech quality achieved for grammatical sentences.

Index Terms: HMM, speech synthesis, spontaneous, conversation, lexical fillers, filled pauses


Cite as: Andersson, S., Yamagishi, J., Clark, R.A.J. (2010) Utilising spontaneous conversational speech in HMM-based speech synthesis. Proc. 7th ISCA Workshop on Speech Synthesis (SSW 7), 173-178

@inproceedings{andersson10_ssw,
  author={Sebastian Andersson and Junichi Yamagishi and Robert A. J. Clark},
  title={{Utilising spontaneous conversational speech in HMM-based speech synthesis}},
  year=2010,
  booktitle={Proc. 7th ISCA Workshop on Speech Synthesis (SSW 7)},
  pages={173--178}
}