Fifth ISCA ITRW on Speech Synthesis

June 14-16, 2004
Pittsburgh, PA, USA

Prominence Prediction for Supersentential Prosodic Modeling based on a New Database

Jason Y. Zhang, Arthur R. Toth, Kevyn Collins-Thompson, Alan W. Black

Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA, USA

Most current prosodic modeling techniques are concerned with variation within the sentence. With the improvement of local prosodic variation modeling in techniques like unit selection, we would like to address issues of wider context in producing appropriate synthetic output. A common experience found in unit selection synthesis is that a sentence that sounds natural in isolation does not sound so natural when embedded in a wider context, because it has inappropriate prosody. This work presents the careful design and creation of a speech database designed to capture significant super-sentential prosodic variation. It was designed specifically to allow our own investigations into a notion of "prominence" which we define as a hidden variable that can contribute to surface level prosodic realization (duration, F0 and power). The background that led up to the construction of this database and our previous attempts to capture prominence are also described.

Full Paper

Bibliographic reference.  Zhang, Jason Y. / Toth, Arthur R. / Collins-Thompson, Kevyn / Black, Alan W. (2004): "Prominence prediction for supersentential prosodic modeling based on a new database", In SSW5-2004, 203-208.