INTERSPEECH 2012
13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

An Evaluation of Parameter Generation Methods with Rich Context Models in HMM-Based Speech Synthesis

Shinnosuke Takamichi (1), Tomoki Toda (1), Yoshinori Shiga (2), Hisashi Kawai (2), Sakriani Sakti (1), Satoshi Nakamura (1)

(1) Graduate School of Information Science, Nara Institute of Science and Technology, Ikoma, Nara, Japan
(2) National Institute of Information and Communications Technology, Japan

In this paper, we propose parameter generation methods using rich context models in HMM-based speech synthesis to improve quality of synthetic speech while keeping capability of flexibly modeling acoustic features. In traditional HMM-based speech synthesis, generated speech parameters tend to be excessively smoothed and the use of them causes muffled sounds in synthetic speech. To alleviate this problem, some hybrid methods combining HMM-based speech synthesis and unit selection synthesis have been proposed. Rich context modeling is one of the hybrid methods of representing acoustic inventories with probability density functions. To make it as flexible as original HMM-based speech synthesis, a novel parameter generation methods using rich context modeling is proposed. Rich context models are reformed as GMMs and the parameter generation based on the maximum likelihood criterion is performed. We conduct several experimental evaluations of the proposed methods from various perspectives. The experimental results demonstrate that the proposed methods yield significant improvements in quality of synthetic speech.

Index Terms: HMM-based speech synthesis, over-smoothing, rich context model, parameter generation

Full Paper

Bibliographic reference.  Takamichi, Shinnosuke / Toda, Tomoki / Shiga, Yoshinori / Kawai, Hisashi / Sakti, Sakriani / Nakamura, Satoshi (2012): "An evaluation of parameter generation methods with rich context models in HMM-based speech synthesis", In INTERSPEECH-2012, 1139-1142.