Sixth ISCA Workshop on Speech Synthesis

Bonn, Germany
August 22-24, 2007

Measuring Attribute Dissimilarity with HMM KL-Divergence for Speech Synthesis

Yong Zhao (1), Chengsuo Zhang (2), Frank K. Soong (1), Min Chu (1), Xi Xiao (2)

(1) Speech Group, Microsoft Research Asia, China
(2) Department of Electronic Engineering, Tsinghua University, China

This paper proposes to use KLD between context-dependent HMMs as target cost in unit selection TTS systems. We train context-dependent HMMs to characterize the contextual attributes of units, and calculate Kullback-Leibler Divergence (KLD) between the corresponding models. We demonstrate that the KLD measure provides a statistically meaningful way to analyze the underlining relations among elements of attributes. With the aid of multidimensional scaling, a set of attributes, including phonetic, prosodic and numerical contexts, are examined by graphically representing elements of the attribute as points on a low-dimensional space, where the distances among points agree with the KLDs among the elements. The KLD between multi-space probability distribution HMMs is derived. A perceptual experiment shows that the TTT system defined with the KLD-based target cost sounds slightly better than one with the manually-tuned.

Full Paper

Bibliographic reference.  Zhao, Yong / Zhang, Chengsuo / Soong, Frank K. / Chu, Min / Xiao, Xi (2007): "Measuring attribute dissimilarity with HMM KL-divergence for speech synthesis", In SSW6-2007, 206-210.