ITRW on Speech and Emotion
September 5-7, 2000
Concatenative speech synthesis is increasing in popularity, as it offers higher quality output than previous formant synthesisers. However, it is based on recorded speech units, concatenative synthesis offers a lesser degree of parametric control during resynthesis. Consequently, adding pragmatic effects such as different speaking styles and emotions at the synthesis stage is fundamentally more difficult than with formant synthesis.
This paper describes the results of a preliminary attempt to add emotion to concatenative synthetic speech (using BT's Laureate synthesiser), initially using techniques already applied successfully to formant synthesis. A new intonation contour (including both pitch and duration changes) was applied to the concatenated segments during production of the final audible utterance, and some of the available synthesis parameters were systematically modified to increase the affective content. The output digital speech samples were then subject to further manipulation with a waveform editing package, to produce the final output utterance. The results of this process were a small number of manually-produced utterances, but which illustrated that affective manipulations were possible on this type of synthesiser.
Further work has produced rule-based implementations which allow automatic production of emotional utterances. Development of these systems will be described, and some initial results from listener studies will be presented.
Bibliographic reference. Murray, Iain R. / Edgington, Mike D. / Campion, Diane / Lynn, Justin (2000): "Rule-based emotion synthesis using concatenated speech", In SpeechEmotion-2000, 173-177.