Fifth ISCA ITRW on Speech Synthesis

June 14-16, 2004
Pittsburgh, PA, USA

Prosodic Data Driven Modelling of a Narrative Style in Festival TTS

Fabio Tesser (1), Piero Cosi (2), Carlo Drioli (2), Graziano Tisato (2)

(1) ITC-IRST Istituto Trentino di Cultura, Centre for Scientific and Technological Research Povo (TN), Italy
(2) ISTC-CNR, Laboratory of Phonetics and Dialectology, Institute of Cognitive Sciences and Technology, Padova, Italy

A general data-driven procedure for creating new prosodic modules for the Italian FESTIVAL Text-To-Speech (TTS) [1] synthesizer is described. These modules are based on the "Classification and Regression Trees" (CART) theory. The prosodic factors taken into consideration are: duration, pitch and loudness. Loudness control has been implemented as an extension to the MBROLA diphone concatenative synthesizer. The prosodic models were trained using two speech corpora with different speaking style, and the effectiveness of the CART-based prosody was assessed with a set of evaluation tests.

Reference

  1. P. Cosi, F. Tesser, R. Gretter, and C. A. (with Introduction by Mike Macon), "Festival speaks italian!", in Proceedings of EUROSPEECH 2001, Aalborg, Denmark, Sept 2001, pp. 509-512.

Full Paper

Bibliographic reference.  Tesser, Fabio / Cosi, Piero / Drioli, Carlo / Tisato, Graziano (2004): "Prosodic data driven modelling of a narrative style in Festival TTS", In SSW5-2004, 185-190.