INTERSPEECH 2004 - ICSLP
One of the main aspects in Text-to-Speech (TtS) synthesis is the successful prediction of tonal events. In this work we deal with the evaluation of corpus-based models in operational environments other than the training ones. Two pitch accent frameworks derived by linguistically enriched speech data from a generic domain and a limited domain were initially evaluated by applying the 10-fold cross validation method. As a second step, we utilized the cross domains data validation. Due to the heterogeneity of the data, we further employed three machine learning approaches, CART, Naive Bayes and Bayesian networks. The results demonstrate that the limited domain models achieve in average 10% improved accuracy in self-domain evaluation, while the generic models preserve a their performance regardless the domain of application.
Bibliographic reference. Zervas, Panagiotis / Fakotakis, Nikos / Kokkinakis, George / Kouroupetroglou, George / Xydas, Gerasimos (2004): "Evaluation of corpus based tone prediction in mismatched environments for greek tts synthesis", In INTERSPEECH-2004, 761-764.