11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Formant-Based Frequency Warping for Improving Speaker Adaptation in HMM TTS

Xin Zhuang (1), Yao Qian (1), Frank K. Soong (1), Yijian Wu (2), Bo Zhang (3)

(1) Microsoft Research, China
(2) Microsoft, China
(3) NanKai University, China

In this paper we investigate frequency warping explicitly on the mapping between the first four formant frequencies of 5 long vowels recorded by source and target speakers. A universal warping function is constructed for improving MLLR-based speaker adaptation performance in TTS. The function is used to warp the frequency scale of a source speakerís data toward that of the target speakerís data and an HMM of frequency warped feature of the source speaker is trained. Finally, the MLLR-based speaker adaptation is applied to the trained HMM for synthesizing the target speakerís speech. When tested on a database of 4,000 sentences (source speaker) and 100 sentences of a male and a female speaker (target speakers), the formant based frequency warping has been found very effective in reducing log spectral distortion over the system without formant frequency warping and this improvement is also confirmed subjectively in AB preference and ABX speaker similarity listening tests.

