15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

DNN-Based Stochastic Postfilter for HMM-Based Speech Synthesis

Ling-Hui Chen (1), Tuomo Raitio (2), Cassia Valentini-Botinhao (3), Junichi Yamagishi (3), Zhen-Hua Ling (1)

(1) USTC, China
(2) Aalto University, Finland
(3) University of Edinburgh, UK

In this paper we propose a deep neural network to model the conditional probability of the spectral differences between natural and synthetic speech. This allows us to reconstruct the spectral fine structures in speech generated by HMMs. We compared the new stochastic data-driven postfilter with global variance based parameter generation and modulation spectrum enhancement. Our results confirm that the proposed method significantly improves the segmental quality of synthetic speech compared to the conventional methods.

Full Paper

Bibliographic reference.  Chen, Ling-Hui / Raitio, Tuomo / Valentini-Botinhao, Cassia / Yamagishi, Junichi / Ling, Zhen-Hua (2014): "DNN-based stochastic postfilter for HMM-based speech synthesis", In INTERSPEECH-2014, 1954-1958.