12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Enriching Text-to-Speech Synthesis Using Automatic Dialog Act Tags

Vivek Kumar Rangarajan Sridhar, Ann Syrdal, Alistair D. Conkie, Srinivas Bangalore

AT&T Labs Research, USA

We present an approach for enriching dialog based text-to-speech (TTS) synthesis systems by explicitly controlling the expressiveness through the use of dialog act tags. The dialog act tags in our framework are automatically obtained by training a maximum entropy classifier on the Switchboard-DAMSL data set, unrelated to the TTS database. We compare the voice quality produced by exploiting automatic dialog act tags with that using human annotations of dialog acts, and with two forms of reference databases. Even though the inventory of tags is different for the automatic tagger and human annotation, exploiting either form of dialog markup generates better voice quality in comparison with the reference voices in subjective evaluation.

