Eighth ISCA Workshop on Speech Synthesis
Barcelona, Catalonia, Spain
Phrase break prediction models in speech synthesis are classifiers that predict whether or not each word boundary is a prosodic break. These classifiers are generally trained to optimize the likelihood of prediction, and their performance is evaluated in terms of classification accuracy. We propose a minimum error rate training method for phrase break prediction. We combine multiple phrasing models into a log-linear framework and optimize the system directly to the quality of break prediction, as measured by the F-measure. We show that this method significantly improves our phrasing models. We also show how this framework allows us to design a knob that can be tweaked to increase or decrease the number of phrase breaks at synthesis time. Index Terms: Speech Synthesis, Phrasing
Bibliographic reference. Parlikar, Alok / Black, Alan W. (2013): "Minimum error rate training for phrasing in speech synthesis", In SSW8, 13-17.