In this paper we investigate frequency warping explicitly on the mapping between the first four formant frequencies of 5 long vowels recorded by source and target speakers. A universal warping function is constructed for improving MLLR-based speaker adaptation performance in TTS. The function is used to warp the frequency scale of a source speakerís data toward that of the target speakerís data and an HMM of frequency warped feature of the source speaker is trained. Finally, the MLLR-based speaker adaptation is applied to the trained HMM for synthesizing the target speakerís speech. When tested on a database of 4,000 sentences (source speaker) and 100 sentences of a male and a female speaker (target speakers), the formant based frequency warping has been found very effective in reducing log spectral distortion over the system without formant frequency warping and this improvement is also confirmed subjectively in AB preference and ABX speaker similarity listening tests.
Bibliographic reference. Zhuang, Xin / Qian, Yao / Soong, Frank K. / Wu, Yijian / Zhang, Bo (2010): "Formant-based frequency warping for improving speaker adaptation in HMM TTS", In INTERSPEECH-2010, 817-820.