September 22-25, 1997
The quality improvement of a Text-To-Speech synthesis system is usually considered as the arduous task of converting any text into speech. This paper is related to the work led at CNET in building application-oriented text-to-speech systems. For a majority of vocal services, the delivered messages have a strong syntactic constraint and use a limited vocabulary. We consider that, with our system, the most hopeful improvements in the overall quality of the speech synthesis signal are linked to the linguistic and prosodic processing. Discarding here segmental problems of the synthetic speech signal, the actual prosodic patterns are judged as too monotonous to allow a great diversity of vocal services. Thus, the actual effort deals with the development of automatic systems to adapt the parameters of statistical prosodic models to a specific speaker's voice under the constraint of a limited amount of different syntactic structures. This work presents an automatic system to build "optimal" training databases used to learn the models' parameters. The formulation of the problem is defined as a set covering problem and is solved using genetic algorithms. Both an objective and a subjective evaluation show the usefulness of this approach.
Bibliographic reference. Boeffard, Olivier / Emerard, F. (1997): "Application-dependent prosodic models for text-to-speech synthesis and automatic design of learning database corpus using genetic algorithm", In EUROSPEECH-1997, 2511-2514.