8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

What Concept-to-Speech Can Gain for Prosody

Markus Schnell (1), Rüdiger Hoffmann (2)

(1) Infineon Technologies AG, Germany
(2) Dresden University of Technology, Germany

This article proposes a concept-to-speech system with automated prosody learning based on reinforcement learning. The concept-to-speech system, named Demosthenes, is an extension of the text-to-speech system DreSS. Demosthenes is responsible for template-based text generation and symbolic prosody prediction, while DreSS takes care of acoustic prosody and speech synthesis. The prosody predictor is an application of reinforcement learning, using content, given and new, contrast, and number of words since last accented words as indicators in state space. The system is trained with a simple rule, giving reward according to prediction performance on a small sample text. For an impression of the gain in prosodic quality, we compare the concept-to-speech system to an existing text-to-speech system. The results indicate a clear preference for the concept-to-speech system.

Full Paper

Bibliographic reference.  Schnell, Markus / Hoffmann, Rüdiger (2004): "What concept-to-speech can gain for prosody", In INTERSPEECH-2004, 2581-2584.