The work described in the paper compared different techniques for learning prosodic regularities from natural speech databases, in view of future developments of ELOQUENS®, the CSELT text-to-speech system for Italian. As an alternative to explicit modelling by rules, the adaptive algorithms ANN (Artificial Neural Nets) and CART (Classification And Regression Trees), have been applied to predict the prosodic parameters of phoneme duration and fundamental frequency. The first experiment, in which the algorithms were trained on a limited-size corpus of neutrally-read declarative sentences, gave encouraging results. The paper argues that, despite of their limits in providing accurate modelling of linguistic phenomena, automatic learning techniques may be considered a promising methodological framework for developing multi-voice, multi-style and multi-language text-to-speech systems, a task requiring research tools to analyze large speech databases and implementation devices to speed up computations.
Bibliographic reference. Mana, F. / Quazza, Silvia (1995): "Text-to-speech oriented automatic learning of Italian prosody", In EUROSPEECH-1995, 589-592.