The Seventh ISCA Tutorial and Research Workshop on Speech Synthesis
Synthesis of listener vocalisations is one of the focused research areas to improve emotionally coloured conversational speech synthesis. To communicate different intentions, a synthesiser should be capable of generating a broad range of vocalisations with different kinds of acoustic properties. However, the data collection for corpus based methods is necessarily limited in acoustic variability. This paper describes our approach to increase the acoustic variability of vocalisations in terms of intonation. After selecting the best candidate for a given target from among the available vocalisations, we use prosody modification techniques to impose a target intonation contour. In an experiment, we combine markedly distinct intonation contours with vocalisations differing in segmental form, using the prosody modification techniques MLSA vocoding, FD-PSOLA, and HNM. In a listening test, we evaluate the perceived naturalness of the resulting synthesised vocalisations, and assess the effect of segmental form, intonation contour and modification technique on perceived meaning.
Index Terms: listener vocalisations, pitch modification, FDPSOLA, HNM, MLSA Vocoding
Bibliographic reference. Pammi, Sathish / Schröder, Marc / Charfuelan, Marcela / Türk, Oytun / Steiner, Ingmar (2010): "Synthesis of listener vocalisations with imposed intonation contours", In SSW7-2010, 240-245.