This paper analyzes the problem of the spectral enhancement technique using global variance (GV) in HMM-based speech synthesis. In the conventional GV-based parameter generation, spectral enhancement with variance compensation is achieved by considering a GV pdf with fixed parameters for every output utterances through the generation process. Although the spectral peaks of the generated trajectory are clearly emphasized and subjective clarity is improved, the use of the fixed GV parameters results in a much smaller variation of GVs among the synthesized utterances than that of the natural speech, which sometimes causes undesirable effect. In this paper, we examine the above problem in terms of multiple objective measures such as variance characteristics, spectral and GV distortions, and GV correlations and discuss the result. We propose a simple alternative technique based on an affine transformation that provides a closer GV distribution to the original speech and improves the correlation of GVs of generated parameter sequences. The experimental results show that the proposed spectral enhancement outperforms the conventional GV-based parameter generation in the objective measures.
Bibliographic reference. Nose, Takashi / Ito, Akinori (2014): "Analysis of spectral enhancement using global variance in HMM-based speech synthesis", In INTERSPEECH-2014, 2917-2921.