Eighth ISCA Workshop on Speech Synthesis
Barcelona, Catalonia, Spain
We present and compare different approaches for cross-variety speaker transformation in Hidden Semi-Markov Model (HSMM) based speech synthesis that allow for a transformation of an arbitrary speakers voice from one variety to another one. The methods developed are applied to three different varieties, namely standard Austrian German, one Middle Bavarian (Upper Austria, Bad Goisern) and one South Bavarian (East Tyrol, Innervillgraten) dialect. For data mapping of HSMM-states we use Kullback-Leibler divergence, transfer probability density functions to the decision tree of the other variety and perform speaker adaptation. We investigate an existing data mapping method and a method that constrains the mappings for common phones and show that both methods can retain speaker similarity and variety similarity. Furthermore we show that in some cases the constrained mapping method gives better results than the standard method. Index Terms: speech synthesis, dialect, transformation, language variety
Bibliographic reference. Toman, Markus / Pucher, Michael / Schabus, Dietmar (2013): "Cross-variety speaker transformation in HSMM-based speech synthesis", In SSW8, 77-81.