Sixth European Conference on Speech Communication and Technology
Accurate prediction of segmental duration from text in a text-tospeech system is difficult for several reasons. One specially relevant is the great quantity of contextual factors that affect timing and how to model them. There are many parameters that affect duration, but not all of them are always relevant. We present a complete environment in which to decide which parameters are more relevant in different situations and the best way to code them. The system is based in a neural network absolutely configurable, and the main effort is made in the parameters to be used, including the contextual effects using windows of variable length.
Full Paper (PDF) Gnu-Zipped Postscript
Bibliographic reference. Córdoba, R. / Vallejo, J. A. / Montero, J. M. / Gutierrez-Arriola, J. / López, M. A. / Pardo, Juan Manuel (1999): "Automatic modeling of duration in a Spanish text-to-speech system using neural networks", In EUROSPEECH'99, 1619-1622.