The ESCA Workshop on Speech Synthesis

September 25-28, 1990
Autrans, France

Automatic Labeling of Large Prosodic Databases : Tools, Methodology and Links with a Text-To-Speech System

Gérard Bailly, Thierry Barbe, Hai-Dong Wang

Institut de la Communication Parlée, Unite Associée au CNRS N° 368, DSfPG/ENSERG, Grenoble, France

This article presents an unified methodology to segment and label acoustic databases. The methodology is entirely based on a phonetic model: the temporal decomposition (TD) model. In this model phonemes are seen as emergence functions (EF) which overlap in time. The segmentation and the determination of the prosodic contour of an acoustic continuum is intimately linked with the detection of the EFs. As the EFs are automatically determined the coherence of the prosodic structure of utterances across the entire corpus is ensured and thus statistical methods can be applied to study the links between formal analysis of the text and prosodic structure of the message. Since the same methodology may be applied to the segmentation of phonetic units, synthesis by concatenative units may be performed : prosodic events detected in the prosodic database and in the phonetic units are entirely compatible. The tools presented below are speaker-independent and cover the entire analysis to synthesis process.

Full Paper

Bibliographic reference.  Bailly, Gérard / Barbe, Thierry / Wang, Hai-Dong (1990): "Automatic labeling of large prosodic databases : tools, methodology and links with a text-to-speech system", In SSW1-1990, 201-204.