INTERSPEECH 2004 - ICSLP
This article proposes a concept-to-speech system with automated prosody learning based on reinforcement learning. The concept-to-speech system, named Demosthenes, is an extension of the text-to-speech system DreSS. Demosthenes is responsible for template-based text generation and symbolic prosody prediction, while DreSS takes care of acoustic prosody and speech synthesis. The prosody predictor is an application of reinforcement learning, using content, given and new, contrast, and number of words since last accented words as indicators in state space. The system is trained with a simple rule, giving reward according to prediction performance on a small sample text. For an impression of the gain in prosodic quality, we compare the concept-to-speech system to an existing text-to-speech system. The results indicate a clear preference for the concept-to-speech system.
Bibliographic reference. Schnell, Markus / Hoffmann, Rüdiger (2004): "What concept-to-speech can gain for prosody", In INTERSPEECH-2004, 2581-2584.