ISCA Archive SSW 2007
ISCA Archive SSW 2007

Measuring attribute dissimilarity with HMM KL-divergence for speech synthesis

Yong Zhao, Chengsuo Zhang, Frank K. Soong, Min Chu, Xi Xiao

This paper proposes to use KLD between context-dependent HMMs as target cost in unit selection TTS systems. We train context-dependent HMMs to characterize the contextual attributes of units, and calculate Kullback-Leibler Divergence (KLD) between the corresponding models. We demonstrate that the KLD measure provides a statistically meaningful way to analyze the underlining relations among elements of attributes. With the aid of multidimensional scaling, a set of attributes, including phonetic, prosodic and numerical contexts, are examined by graphically representing elements of the attribute as points on a low-dimensional space, where the distances among points agree with the KLDs among the elements. The KLD between multi-space probability distribution HMMs is derived. A perceptual experiment shows that the TTT system defined with the KLD-based target cost sounds slightly better than one with the manually-tuned.


Cite as: Zhao, Y., Zhang, C., Soong, F.K., Chu, M., Xiao, X. (2007) Measuring attribute dissimilarity with HMM KL-divergence for speech synthesis. Proc. 6th ISCA Workshop on Speech Synthesis (SSW 6), 206-210

@inproceedings{zhao07_ssw,
  author={Yong Zhao and Chengsuo Zhang and Frank K. Soong and Min Chu and Xi Xiao},
  title={{Measuring attribute dissimilarity with HMM KL-divergence for speech synthesis}},
  year=2007,
  booktitle={Proc. 6th ISCA Workshop on Speech Synthesis (SSW 6)},
  pages={206--210}
}