Sixth European Conference on Speech Communication and Technology
(EUROSPEECH'99)

Budapest, Hungary
September 5-9, 1999

Automatic Modeling of Duration in a Spanish Text-to-Speech System Using Neural Networks

R. Córdoba, J. A. Vallejo, J. M. Montero, J. Gutierrez-Arriola, M. A. López, Juan Manuel Pardo

Grupo de Tecnología del Habla, Departamento de Ingeniería Electrónica, Universidad Politécnica de Madrid E.T.S.I. Telecomunicación, Ciudad Universitaria s/n, Madrid, Spain

Accurate prediction of segmental duration from text in a text-tospeech system is difficult for several reasons. One specially relevant is the great quantity of contextual factors that affect timing and how to model them. There are many parameters that affect duration, but not all of them are always relevant. We present a complete environment in which to decide which parameters are more relevant in different situations and the best way to code them. The system is based in a neural network absolutely configurable, and the main effort is made in the parameters to be used, including the contextual effects using windows of variable length.


Full Paper (PDF)   Gnu-Zipped Postscript

Bibliographic reference.  Córdoba, R. / Vallejo, J. A. / Montero, J. M. / Gutierrez-Arriola, J. / López, M. A. / Pardo, Juan Manuel (1999): "Automatic modeling of duration in a Spanish text-to-speech system using neural networks", In EUROSPEECH'99, 1619-1622.