Sixth European Conference on Speech Communication and Technology

Budapest, Hungary
September 5-9, 1999

Aiuruete: A High-Quality Concatenative Text-to-Speech System for Brazilian Portuguese with Demisyllabic Analysis-Based Units and a Hierarchical Model of Rhythm Production

Plínio A. Barbosa (1), Fábio Violaro (1), Eleonora C. Albano (1), Flávio Simoes (1), Patrícia Aquino (1), Sandra Madureira (2), Edson Francozo (1)

(1) Universidade Estadual de Campinas; (2) Pontifícia Universidade Católica de Sao Paulo, Brazil

Aiuruete is a high-quality concatenative TTS system for Brazilian Portuguese. Its name (pronounced [aju,rue'te]) illustrates the challenges we have fixed as a research paradigm: to feed the system with the specificities of our language, highlighted by an up-to-date discussion of the Phonology/Phonetics and prosody/segments interfaces, without a huge computational cost. The choice for the concatenative method of synthesis was determined by a trade-off between scientific (the desired human-like naturalness of the acoustic output) and practical (mainly reduced staff and tight schedule) constraints. Procedural and declarative modules are described here: the ortofon, the unit inventory, the rhythm model and the synthesis techniques. Aiuruete is still being evaluated, but when compared to the previous system, adopted by the national telephony company, its superior quality is apparent.

Full Paper (PDF)   Gnu-Zipped Postscript

Bibliographic reference.  Barbosa, Plínio A. / Violaro, Fábio / Albano, Eleonora C. / Simoes, Flávio / Aquino, Patrícia / Madureira, Sandra / Francozo, Edson (1999): "Aiuruete: a high-quality concatenative text-to-speech system for brazilian portuguese with demisyllabic analysis-based units and a hierarchical model of rhythm production", In EUROSPEECH'99, 2059-2062.