This paper proposes to use KLD between context-dependent HMMs as target cost in unit selection TTS systems. We train context-dependent HMMs to characterize the contextual attributes of units, and calculate Kullback-Leibler Divergence (KLD) between the corresponding models. We demonstrate that the KLD measure provides a statistically meaningful way to analyze the underlining relations among elements of attributes. With the aid of multidimensional scaling, a set of attributes, including phonetic, prosodic and numerical contexts, are examined by graphically representing elements of the attribute as points on a low-dimensional space, where the distances among points agree with the KLDs among the elements. The KLD between multi-space probability distribution HMMs is derived. A perceptual experiment shows that the TTT system defined with the KLD-based target cost sounds slightly better than one with the manually-tuned.
Cite as: Zhao, Y., Zhang, C., Soong, F.K., Chu, M., Xiao, X. (2007) Measuring attribute dissimilarity with HMM KL-divergence for speech synthesis. Proc. 6th ISCA Workshop on Speech Synthesis (SSW 6), 206-210
@inproceedings{zhao07_ssw, author={Yong Zhao and Chengsuo Zhang and Frank K. Soong and Min Chu and Xi Xiao}, title={{Measuring attribute dissimilarity with HMM KL-divergence for speech synthesis}}, year=2007, booktitle={Proc. 6th ISCA Workshop on Speech Synthesis (SSW 6)}, pages={206--210} }