Eighth ISCA Workshop on Speech Synthesis

Barcelona, Catalonia, Spain
August 31-September 2, 2013

Minimum Error Rate Training for Phrasing in Speech Synthesis

Alok Parlikar, Alan W. Black

Carnegie Mellon University, Pittsburgh, PA, USA

Phrase break prediction models in speech synthesis are classifiers that predict whether or not each word boundary is a prosodic break. These classifiers are generally trained to optimize the likelihood of prediction, and their performance is evaluated in terms of classification accuracy. We propose a minimum error rate training method for phrase break prediction. We combine multiple phrasing models into a log-linear framework and optimize the system directly to the quality of break prediction, as measured by the F-measure. We show that this method significantly improves our phrasing models. We also show how this framework allows us to design a knob that can be tweaked to increase or decrease the number of phrase breaks at synthesis time. Index Terms: Speech Synthesis, Phrasing

Full Paper

Bibliographic reference.  Parlikar, Alok / Black, Alan W. (2013): "Minimum error rate training for phrasing in speech synthesis", In SSW8, 13-17.