Automatic prediction of pitch accent assignment is an important but challenging task in text-to-speech synthesis (TTS). Early work in accent prediction relied on simple word-class distinctions, but recently more sophisticated inductive learning models using multiple features have been applied to the problem. For our neural network accent classifier, we developed a corpus that was labeled according to judgments of accent assignment appropriateness in synthesized speech rather than the usual ToBI annotation guidelines. Because the resulting training set was imbalanced, the baseline neural network we developed for this task had a very high accuracy rate (84%) but performed only slightly better than chance according to our ROC analysis. Balancing our training data using downsizing, oversampling, and cost-based post-processing yielded significant improvement in this informative measure. We anticipate that balance adjustments and the inclusion of more complex features will lead to further improvement.
Cite as: Mishra, T., Tucker Prud'hommeaux, E., Santen, J.P.H.v. (2007) Word accentuation prediction using a neural net classifier. Proc. 6th ISCA Workshop on Speech Synthesis (SSW 6), 246-251
@inproceedings{mishra07_ssw, author={Taniya Mishra and Emily {Tucker Prud'hommeaux} and Jan P. H. van Santen}, title={{Word accentuation prediction using a neural net classifier}}, year=2007, booktitle={Proc. 6th ISCA Workshop on Speech Synthesis (SSW 6)}, pages={246--251} }