The Seventh ISCA Tutorial and Research Workshop on Speech Synthesis

Kyoto, Japan
September 22-24, 2010

Statistical Parametric Speech Synthesis with Joint Estimation of Acoustic and Excitation Model Parameters

Ranniery Maia, Heiga Zen, M. J. F. Gales

Toshiba Research Europe Ltd., Cambridge Research Laboratory, Cambridge, UK

This paper describes a novel framework for statistical parametric speech synthesis in which statistical modeling of the speech waveform is performed through the joint estimation of acoustic and excitation model parameters. The proposed method combines extraction of spectral parameters, considered as hidden variables, and excitation signal modeling in a fashion similar to factor analyzed trajectory hidden Markov model. The resulting joint model can be interpreted as a waveform level closed-loop training, where the distance between natural and synthesized speech is minimized. An algorithm based on the maximum likelihood criterion is introduced to train the proposed joint model and some experiments are presented to show its effectiveness.

Index terms: statistical parametric speech synthesis, trajectory hidden Markov model, excitation modeling, factor analysis.

Full Paper

Bibliographic reference.  Maia, Ranniery / Zen, Heiga / Gales, M. J. F. (2010): "Statistical parametric speech synthesis with joint estimation of acoustic and excitation model parameters", In SSW7-2010, 88-93.