Sixth ISCA Workshop on Speech Synthesis

Bonn, Germany
August 22-24, 2007

An Excitation Model for HMM-Based Speech Synthesis Based on Residual Modeling

Ranniery Maia (1), Tomoki Toda (1,2), Heiga Zen (3), Yoshihiko Nankaku (3), Keiichi Tokuda (1,3)

(1) National Inst. of Inform. and Comm. Tech. (NiCT) / ATR Spoken Language Comm. Labs, Japan
(2) Nara Institute of Science and Technology, Japan
(3) Nagoya Institute of Technology, Japan

This paper describes a trainable excitation approach to eliminate the unnaturalness of HMM-based speech synthesizers. During the waveform generation part, mixed excitation is constructed by state-dependent filtering of pulse trains and white noise sequences. In the training part, filters and pulse trains are jointly optimized through a procedure which resembles analysis-bysynthesis speech coding algorithms, where likelihood maximization of residual signals (derived from the same database which is used to train the HMM-based synthesizer) is pursued. Preliminary results show that the novel excitation model in question eliminates the unnaturalness of synthesized speech, being comparable in quality to the the best approaches thus far reported to eradicate the buzziness of HMM-based synthesizers.

Full Paper   Presentation (pdf)
Sound examples:
Proposed_1   Proposed_2   Simple_1   Simple_2   Natural  

Bibliographic reference.  Maia, Ranniery / Toda, Tomoki / Zen, Heiga / Nankaku, Yoshihiko / Tokuda, Keiichi (2007): "An excitation model for HMM-based speech synthesis based on residual modeling", In SSW6-2007, 131-136.