Third ESCA/COCOSDA Workshop on Speech Synthesis
November 26-29, 1998
Generating a near-to-natural speech rhythm can greatly contribute to the user's acceptance of TTS systems. Beside common aspects of the rhythm control (correctness of the segmental durations, robust function, etc.) rhythmic flexibility for several applications and individual speaking styles are desired. This article describes a data driven concept, which aims at the generation of an individual speech rhythm for the Dresden TTS system for German (DreSS). An additional, prosodic-phonetic database has been extracted from the source speakers of the existing diphone inventories (acoustic synthesis). This database is used for adjusting rule-based and statistic models for the duration control, but also for training an alternative, neural network model (ANN). Several combinations of the models have been tested. From the current point of view, the effect of the specific model used is less than expected, but the appropriate design of the prosodic database seems to support the necessary variety of the rhythmic parameters. A limited individual modeling of the speech rhythm is possible. However, the global evaluation of the introduced approach includes some contradictions; more extensive tests are required.
Bibliographic reference. Jokisch, Oliver / Hirschfeld, Diane / Eichner, Matthias / Hoffmann, Rüdiger (1998): "Creating an Individual Speech Rhythm: A Data Driven Approach", In SSW3-1998, 115-119.