We present and compare different approaches for cross-variety speaker transformation in Hidden Semi-Markov Model (HSMM) based speech synthesis that allow for a transformation of an arbitrary speaker’s voice from one variety to another one. The methods developed are applied to three different varieties, namely standard Austrian German, one Middle Bavarian (Upper Austria, Bad Goisern) and one South Bavarian (East Tyrol, Innervillgraten) dialect. For data mapping of HSMM-states we use Kullback-Leibler divergence, transfer probability density functions to the decision tree of the other variety and perform speaker adaptation. We investigate an existing data mapping method and a method that constrains the mappings for common phones and show that both methods can retain speaker similarity and variety similarity. Furthermore we show that in some cases the constrained mapping method gives better results than the standard method.
Index Terms: speech synthesis, dialect, transformation, language variety
Cite as: Toman, M., Pucher, M., Schabus, D. (2013) Cross-variety speaker transformation in HSMM-based speech synthesis. Proc. 8th ISCA Workshop on Speech Synthesis (SSW 8), 77-81
@inproceedings{toman13_ssw, author={Markus Toman and Michael Pucher and Dietmar Schabus}, title={{Cross-variety speaker transformation in HSMM-based speech synthesis}}, year=2013, booktitle={Proc. 8th ISCA Workshop on Speech Synthesis (SSW 8)}, pages={77--81} }