14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Improvements to HMM-Based Speech Synthesis Based on Parameter Generation with Rich Context Models

Shinnosuke Takamichi (1), Tomoki Toda (1), Yoshinori Shiga (2), Sakriani Sakti (1), Graham Neubig (1), Satoshi Nakamura (1)

(1) NAIST, Japan
(2) NICT, Japan

In this paper, we improve parameter generation with rich context models by modifying an initialization method and further apply it to both spectral and F0 components in HMM-based speech synthesis. To alleviate over-smoothing effects caused by the traditional parameter generation methods, we have previously proposed an iterative parameter generation method with rich context models. It has been reported that this method yields quality improvements in synthetic speech but there are still limitations. This is because 1) this generation method still suffers from the over-smoothing effect, as it uses the parameters generated by the traditional method as an initial parameters, which strongly affect on the finally generated parameters and 2) it is applied to only the spectral component. To address these issues, we propose 1) an initialization method to generate less smoothed but more discontinuous initial parameters that tend to yield better generated parameters, and 2) a parameter generation method with rich context models for the F0 component. Experimental results show that the proposed methods yield significant improvements in quality of synthetic speech.

Full Paper

Bibliographic reference.  Takamichi, Shinnosuke / Toda, Tomoki / Shiga, Yoshinori / Sakti, Sakriani / Neubig, Graham / Nakamura, Satoshi (2013): "Improvements to HMM-based speech synthesis based on parameter generation with rich context models", In INTERSPEECH-2013, 364-368.