Fourth ISCA ITRW on Speech Synthesis
August 29 - September 1, 2001
In this paper two approaches for data driven prediction of accent labels - perceptual accents and pitch accents - on word level for speech synthesis are presented. In the first approach a causal and retro-causal NN model is used to determine Bayesian a posteriori probabilities for the occurrence of a certain accent label. These probabilities are calculated using context windows of part-of-speech (POS) tags and context windows of phrase break labels. In the second approach the probabilities determined by the NN are used as emission probabilities for the states of a Markov model (hybrid approach). The transition probabilities of the Markov model are determined by an n-gram. The two ap- proaches are trained and tested on three different prosodically labeled data bases. With both approaches prediction accuracy was higher than that reported in other studies. For qualitative evaluation a new evaluation scheme is presented and discussed. It is found that the first approach applying the NN model gives the best results with respect to the quality of prosodically labeled sentences.
Bibliographic reference. Müller, Achim F. / Hoffmann, Rüdiger (2001): "A neural network and a hybrid approach for accent label prediction", In SSW4-2001, paper 102.