Eighth ISCA Workshop on Speech Synthesis
Barcelona, Catalonia, Spain
We present a new method to rapidly adapt the models of a statistical synthesizer to the voice of a new speaker. We apply a relatively simple linear transform that consists of a vocal tract length normalization (VTLN) part and a long-term average cepstral correction part. Despite the logical limitations of this approach, we will show that it effectively reduces the gap between source and target voices with only one reference utterance and without any phonetic segmentation. In addition, by using a minimum generation error criterion we avoid some of the problems that have been reported to arise when using a maximum likelihood criterion in VTLN. Index Terms: statistical parametric speech synthesis, speaker adaptation, vocal tract length normalization
Bibliographic reference. Erro, Daniel / Alonso, Agustin / Serrano, Luis / Navas, Eva / Hernaez, Inma (2013): "New method for rapid vocal tract length adaptation in HMMbased speech synthesis", In SSW8, 125-128.