In this paper, we propose a new objective evaluation method for hidden Markov model (HMM)-based speech synthesis using Kullback-Leibler divergence (KLD). The KLD is used to measure the difference between the probability density functions (PDFs) of the acoustic feature vectors extracted from natural training and synthetic speech data. For the evaluation, Gaussian mixture model (GMM) is used to model the distribution of acoustic feature vectors, including the fundamental frequency (F0). Continuous F0, obtained with linear interpolation, is used in the evaluation. In essence, the KLD is the expectation of the logarithmic difference between the likelihoods calculated on training and synthetic speech. This likelihood difference is appropriate to characterize the quality of a HMM-based speech synthesis system in generating synthetic speech using a maximum likelihood criterion. The objective evaluation is tested with 3 different HMM-based speech synthesis systems which use multi-space distribution (MSD) to model discontinuous F0. These systems are trained on a common speech corpus in French. We propose an index to evaluate HMM-based speech synthesis system which takes into account the relative variation of the KLDs on test sets of synthetic and natural speech. This index correlates inversely with the result of the MOS (mean opinion score) perceptual test.
Bibliographic reference. Do, C. -T. / Evrard, M. / Leman, A. / d'Alessandro, Christophe / Rilliard, Albert / Crebouw, J. -L. (2014): "Objective evaluation of HMM-based speech synthesis system using kullback-leibler divergence", In INTERSPEECH-2014, 2952-2956.