INTERSPEECH 2004 - ICSLP
The perceived quality of synthetic speech strongly depends on its prosodic naturalness. Departing from the syllable-based, adaptive prosody model IGM the authors surveyed a novel evolutionary approach to optimize the model structure itself and to finally improve the predicted prosodic contours. A German newsreader corpus has been trained using a feed forward neural network. In parallel, network and data configurations were automatically optimized using the Strength Pareto Evolutionary Algorithm (SPEA). Achieving similar prediction results as in the original IGM configuration, the evolutionary optimization reduces the network and parameter complexity. This optimization method may be helpful in the further development of resource-saving prosody modules, e.g., for use in embedded text-to-speech applications and it also eases the difficult introspection of prosodic rules which are automatically generated during training. Nevertheless, preliminary perceptive tests show no significant differences in comparison to synthetic stimuli based on prosodic contours predicted by the original model.
Bibliographic reference. Jokisch, Oliver / Hofmann, Michael (2004): "Evolutionary optimization of an adaptive prosody model", In INTERSPEECH-2004, 797-800.