Sixth ISCA Workshop on Speech Synthesis
This paper proposes to use KLD between context-dependent HMMs as target cost in unit selection TTS systems. We train context-dependent HMMs to characterize the contextual attributes of units, and calculate Kullback-Leibler Divergence (KLD) between the corresponding models. We demonstrate that the KLD measure provides a statistically meaningful way to analyze the underlining relations among elements of attributes. With the aid of multidimensional scaling, a set of attributes, including phonetic, prosodic and numerical contexts, are examined by graphically representing elements of the attribute as points on a low-dimensional space, where the distances among points agree with the KLDs among the elements. The KLD between multi-space probability distribution HMMs is derived. A perceptual experiment shows that the TTT system defined with the KLD-based target cost sounds slightly better than one with the manually-tuned.
Bibliographic reference. Zhao, Yong / Zhang, Chengsuo / Soong, Frank K. / Chu, Min / Xiao, Xi (2007): "Measuring attribute dissimilarity with HMM KL-divergence for speech synthesis", In SSW6-2007, 206-210.