We present an approach for enriching dialog based text-to-speech (TTS) synthesis systems by explicitly controlling the expressiveness through the use of dialog act tags. The dialog act tags in our framework are automatically obtained by training a maximum entropy classifier on the Switchboard-DAMSL data set, unrelated to the TTS database. We compare the voice quality produced by exploiting automatic dialog act tags with that using human annotations of dialog acts, and with two forms of reference databases. Even though the inventory of tags is different for the automatic tagger and human annotation, exploiting either form of dialog markup generates better voice quality in comparison with the reference voices in subjective evaluation.
Bibliographic reference. Sridhar, Vivek Kumar Rangarajan / Syrdal, Ann / Conkie, Alistair D. / Bangalore, Srinivas (2011): "Enriching text-to-speech synthesis using automatic dialog act tags", In INTERSPEECH-2011, 317-320.