8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Evaluation of Corpus Based Tone Prediction in Mismatched Environments for Greek TtS Synthesis

Panagiotis Zervas, Nikos Fakotakis, George Kokkinakis, George Kouroupetroglou, Gerasimos Xydas

University of Patras, Greece

One of the main aspects in Text-to-Speech (TtS) synthesis is the successful prediction of tonal events. In this work we deal with the evaluation of corpus-based models in operational environments other than the training ones. Two pitch accent frameworks derived by linguistically enriched speech data from a generic domain and a limited domain were initially evaluated by applying the 10-fold cross validation method. As a second step, we utilized the cross domains data validation. Due to the heterogeneity of the data, we further employed three machine learning approaches, CART, Naive Bayes and Bayesian networks. The results demonstrate that the limited domain models achieve in average 10% improved accuracy in self-domain evaluation, while the generic models preserve a their performance regardless the domain of application.

Full Paper

Bibliographic reference.  Zervas, Panagiotis / Fakotakis, Nikos / Kokkinakis, George / Kouroupetroglou, George / Xydas, Gerasimos (2004): "Evaluation of corpus based tone prediction in mismatched environments for greek tts synthesis", In INTERSPEECH-2004, 761-764.