In this paper we apply adaptive modeling methods in Hidden Semi-Markov Model (HSMM) based speech synthesis to the modeling of three different varieties, namely standard Austrian German, one Middle Bavarian (Upper Austria, Bad Goisern), and one South Bavarian (East Tyrol, Innervillgraten) dialect. We investigate different adaptation methods like dialect-adaptive training and dialect clustering that can exploit the common phone sets of dialects and standard, as well as speaker-dependent modeling. We show that most adaptive and speaker-dependent methods achieve a good score on overall (speaker and variety) similarity. Concerning overall quality there is no significant difference between adaptive methods and speaker-dependent methods in general for the present data set.
Index Terms: speech synthesis, dialect, voice modeling, adaptation
Cite as: Toman, M., Pucher, M., Schabus, D. (2013) Multi-variety adaptive acoustic modeling in HSMM-based speech synthesis. Proc. 8th ISCA Workshop on Speech Synthesis (SSW 8), 83-87
@inproceedings{toman13b_ssw, author={Markus Toman and Michael Pucher and Dietmar Schabus}, title={{Multi-variety adaptive acoustic modeling in HSMM-based speech synthesis}}, year=2013, booktitle={Proc. 8th ISCA Workshop on Speech Synthesis (SSW 8)}, pages={83--87} }