In order to produce natural sounding output, speech synthesis systems need to be able to properly model the acoustic variability in the corpus. Creaky voice is a voice quality frequently produced in many languages, in both read and conversational speech settings. However, the creaky excitation displays different acoustic characteristics than modal excitations and is, hence, not suitably modelled by standard vocoders. This study presents an analysis of the creaky excitation which is used to derive an extension of the Deterministic plus Stochastic Model of the residual signal. This proposed model is designed for an appropriate modeling of creaky voice and is integrated into a vocoder for parametric speech synthesis. Analysis-synthesis versions of short speech segments containing creaky voice were used in a subjective listening test which revealed clearly better rendering of the voice quality than a standard vocoder.
Index Terms: Voice quality, speech synthesis, creak, vocal fry
Bibliographic reference. Drugman, Thomas / Kane, John / Gobl, Christer (2012): "Modeling the creaky excitation for parametric speech synthesis", In INTERSPEECH-2012, 1424-1427.