The approach outlined in this paper aims to provide better expressivity of unit selection TTS for dialog intended applications while retaining the natural sounding voice quality typical of unit selection synthesis. A small set of speech acts were used to annotate a corpus from one female US English speaker. The corpus was composed of speech read primarily from interactive dialogs of various kinds. Global acoustic variables related to prosody were calculated for each speech act in the corpus. A hierarchical cluster analysis performed on the acoustic variables showed clustering that corresponded to general classes of dialog speech acts. The acoustic prosodic variables were used to specify pitch range parameters of a unit selection Speech Act TTS voice. Listening tests indicated large and significant improvement in rated speech quality for the Speech Act system compared to the Standard TTS system built from the same speaker.
Index Terms: speech synthesis, dialog, speech acts, prosody
Cite as: Syrdal, A.K., Conkie, A., Kim, Y.-J., Beutnagel, M.C. (2010) Speech acts and dialog TTS. Proc. 7th ISCA Workshop on Speech Synthesis (SSW 7), 179-183
@inproceedings{syrdal10_ssw, author={Ann K. Syrdal and Alistair Conkie and Yeon-Jun Kim and Mark C. Beutnagel}, title={{Speech acts and dialog TTS}}, year=2010, booktitle={Proc. 7th ISCA Workshop on Speech Synthesis (SSW 7)}, pages={179--183} }