In this paper, we propose parameter generation methods using rich context models in HMM-based speech synthesis to improve quality of synthetic speech while keeping capability of flexibly modeling acoustic features. In traditional HMM-based speech synthesis, generated speech parameters tend to be excessively smoothed and the use of them causes muffled sounds in synthetic speech. To alleviate this problem, some hybrid methods combining HMM-based speech synthesis and unit selection synthesis have been proposed. Rich context modeling is one of the hybrid methods of representing acoustic inventories with probability density functions. To make it as flexible as original HMM-based speech synthesis, a novel parameter generation methods using rich context modeling is proposed. Rich context models are reformed as GMMs and the parameter generation based on the maximum likelihood criterion is performed. We conduct several experimental evaluations of the proposed methods from various perspectives. The experimental results demonstrate that the proposed methods yield significant improvements in quality of synthetic speech.
Index Terms: HMM-based speech synthesis, over-smoothing, rich context model, parameter generation
Bibliographic reference. Takamichi, Shinnosuke / Toda, Tomoki / Shiga, Yoshinori / Kawai, Hisashi / Sakti, Sakriani / Nakamura, Satoshi (2012): "An evaluation of parameter generation methods with rich context models in HMM-based speech synthesis", In INTERSPEECH-2012, 1139-1142.