The Seventh ISCA Tutorial and Research Workshop on Speech Synthesis

Kyoto, Japan
September 22-24, 2010

Spectral Modeling with Contextual Additive Structure for HMM-based Speech Synthesis

Shinji Takaki, Yoshihiko Nankaku, Keiichi Tokuda

Department of Computer Science and Engineering, Nagoya Institute of Technology, Nagoya, Japan

This paper proposes a spectral modeling technique based on additive structure of context dependencies for HMM-based speech synthesis. Contextual additive structure models can represent complicated dependencies between acoustic features and context labels using multiple decision trees. However, its computational complexity of the context clustering is too high for full context labels of speech synthesis. To overcome this problem, this paper proposes two approaches; covariance parameter tying and a likelihood calculation algorithm using matrix inversion lemma. Experimental results show that the proposed method outperforms the conventional one in subjective listening tests.

Index Terms: Hidden Markov models, Spectral modeing, Decision trees, Context clustering, Additive structure, Distribution convolution

Full Paper

Bibliographic reference.  Takaki, Shinji / Nankaku, Yoshihiko / Tokuda, Keiichi (2010): "Spectral modeling with contextual additive structure for HMM-based speech synthesis", In SSW7-2010, 100-105.