ISCA Archive SSW 2010
ISCA Archive SSW 2010

Spectral modeling with contextual additive structure for HMM-based speech synthesis

Shinji Takaki, Yoshihiko Nankaku, Keiichi Tokuda

This paper proposes a spectral modeling technique based on additive structure of context dependencies for HMM-based speech synthesis. Contextual additive structure models can represent complicated dependencies between acoustic features and context labels using multiple decision trees. However, its computational complexity of the context clustering is too high for full context labels of speech synthesis. To overcome this problem, this paper proposes two approaches; covariance parameter tying and a likelihood calculation algorithm using matrix inversion lemma. Experimental results show that the proposed method outperforms the conventional one in subjective listening tests.

Index Terms: Hidden Markov models, Spectral modeing, Decision trees, Context clustering, Additive structure, Distribution convolution


Cite as: Takaki, S., Nankaku, Y., Tokuda, K. (2010) Spectral modeling with contextual additive structure for HMM-based speech synthesis. Proc. 7th ISCA Workshop on Speech Synthesis (SSW 7), 100-105

@inproceedings{takaki10_ssw,
  author={Shinji Takaki and Yoshihiko Nankaku and Keiichi Tokuda},
  title={{Spectral modeling with contextual additive structure for HMM-based speech synthesis}},
  year=2010,
  booktitle={Proc. 7th ISCA Workshop on Speech Synthesis (SSW 7)},
  pages={100--105}
}