Eighth ISCA Workshop on Speech Synthesis

Barcelona, Catalonia, Spain
August 31-September 2, 2013

New Method for Rapid Vocal Tract Length Adaptation in HMMbased Speech Synthesis

Daniel Erro (1,2), Agustin Alonso (2), Luis Serrano (2), Eva Navas (2), Inma Hernaez (2)

(1) Ikerbasque - UPV/EHU, Spain; (2) University of the Basque Country (UPV/EHU), Spain

We present a new method to rapidly adapt the models of a statistical synthesizer to the voice of a new speaker. We apply a relatively simple linear transform that consists of a vocal tract length normalization (VTLN) part and a long-term average cepstral correction part. Despite the logical limitations of this approach, we will show that it effectively reduces the gap between source and target voices with only one reference utterance and without any phonetic segmentation. In addition, by using a minimum generation error criterion we avoid some of the problems that have been reported to arise when using a maximum likelihood criterion in VTLN. Index Terms: statistical parametric speech synthesis, speaker adaptation, vocal tract length normalization

