5th International Conference on Spoken Language Processing

Sydney, Australia
November 30 - December 4, 1998

Multi-Level Rhythm Control for Speech Synthesis Using Hybrid Data Driven and Rule-Based Approaches

Oliver Jokisch, Diane Hirschfeld, Matthias Eichner, Rudiger Hoffmann

Technical Acoustics Laboratory, Dresden University of Technology, Germany

This paper presents: a multi-level concept to generate the speech rhythm in the Dresden TTS system for German (DreSS). The rhythm control includes the phrase, the syllabic and the phonemic level. The concept allows the alternative use of rule-based or statistical, but also data driven methods on these levels. To create the rules and to train a neural network, a new speech corpus from original speakers of the diphone-based inventories has been recorded. The corpus covers texts and single utterances and is subdivided into phrase, syllabic and phonemic databases. First results indicating that the rule-based and the train-based methods generate a comparable speech rhythm, if the databases are uniform. The stepwise duration control on several prosodic levels shows promise as a method of producing a flexible rhythm depending on the specific TTS application.

