We present a new method to rapidly adapt the models of a statistical synthesizer to the voice of a new speaker. We apply a relatively simple linear transform that consists of a vocal tract length normalization (VTLN) part and a long-term average cepstral correction part. Despite the logical limitations of this approach, we will show that it effectively reduces the gap between source and target voices with only one reference utterance and without any phonetic segmentation. In addition, by using a minimum generation error criterion we avoid some of the problems that have been reported to arise when using a maximum likelihood criterion in VTLN.
Index Terms: statistical parametric speech synthesis, speaker adaptation, vocal tract length normalization
Cite as: Erro, D., Alonso, A., Serrano, L., Navas, E., Hernaez, I. (2013) New method for rapid vocal tract length adaptation in HMMbased speech synthesis. Proc. 8th ISCA Workshop on Speech Synthesis (SSW 8), 125-128
@inproceedings{erro13_ssw, author={Daniel Erro and Agustin Alonso and Luis Serrano and Eva Navas and Inma Hernaez}, title={{New method for rapid vocal tract length adaptation in HMMbased speech synthesis}}, year=2013, booktitle={Proc. 8th ISCA Workshop on Speech Synthesis (SSW 8)}, pages={125--128} }