The Seventh ISCA Tutorial and Research Workshop on Speech Synthesis
This paper proposes a spectral modeling technique based on additive structure of context dependencies for HMM-based speech synthesis. Contextual additive structure models can represent complicated dependencies between acoustic features and context labels using multiple decision trees. However, its computational complexity of the context clustering is too high for full context labels of speech synthesis. To overcome this problem, this paper proposes two approaches; covariance parameter tying and a likelihood calculation algorithm using matrix inversion lemma. Experimental results show that the proposed method outperforms the conventional one in subjective listening tests.
Index Terms: Hidden Markov models, Spectral modeing, Decision trees, Context clustering, Additive structure, Distribution convolution
Bibliographic reference. Takaki, Shinji / Nankaku, Yoshihiko / Tokuda, Keiichi (2010): "Spectral modeling with contextual additive structure for HMM-based speech synthesis", In SSW7-2010, 100-105.