INTERSPEECH 2013
14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Minimum Mean Squared Error Based Warped Complex Cepstrum Analysis for Statistical Parametric Speech Synthesis

Ranniery Maia (1), M. J. F. Gales (1), Yannis Stylianou (1), Masami Akamine (2)

(1) Toshiba Research Europe Ltd., UK
(2) Toshiba, Japan

This paper presents an approach for complex cepstrum analysis based on the minimum mean squared error criterion, and describes its application to statistical parametric speech synthesis. The proposed method alleviates some of the issues associated with conventional complex cepstrum analysis, such as choice of the window, phase unwrapping, and the need for accurate pitch marks. Given initial estimates of warped complex cepstra and respective analysis instants, the method iteratively optimizes the complex cepstrum on a warped quefrency domain by minimizing the mean squared error between the natural and the reconstructed speech waveforms. When applied to statistical parametric speech synthesis, the optimized complex cepstrum results in better performance in terms of synthesized speech quality, specially for emotional databases, when compared with the complex cepstrum calculated through conventional methods.

Full Paper

Bibliographic reference.  Maia, Ranniery / Gales, M. J. F. / Stylianou, Yannis / Akamine, Masami (2013): "Minimum mean squared error based warped complex cepstrum analysis for statistical parametric speech synthesis", In INTERSPEECH-2013, 2336-2340.