ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

A Bayesian approach to Hidden Semi-Markov Model based speech synthesis

Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda

This paper proposes a Bayesian approach to hidden semi-Markov model (HSMM) based speech synthesis. Recently, hidden Markov model (HMM) based speech synthesis based on the Bayesian approach was proposed. The Bayesian approach is a statistical technique for estimating reliable predictive distributions by treating model parameters as random variables. In the Bayesian approach, all processes for constructing the system are derived from one single predictive distribution which exactly represents the problem of speech synthesis. However, there is an inconsistency between training and synthesis: although the speech is synthesized from HMMs with explicit state duration probability distributions, HMMs are trained without them. In this paper, we introduce an HSMM, which is an HMM with explicit state duration probability distributions, into the HMM-based Bayesian speech synthesis system. Experimental results show that the use of HSMM improves the naturalness of the synthesized speech.


doi: 10.21437/Interspeech.2009-141

Cite as: Hashimoto, K., Nankaku, Y., Tokuda, K. (2009) A Bayesian approach to Hidden Semi-Markov Model based speech synthesis. Proc. Interspeech 2009, 1751-1754, doi: 10.21437/Interspeech.2009-141

@inproceedings{hashimoto09_interspeech,
  author={Kei Hashimoto and Yoshihiko Nankaku and Keiichi Tokuda},
  title={{A Bayesian approach to Hidden Semi-Markov Model based speech synthesis}},
  year=2009,
  booktitle={Proc. Interspeech 2009},
  pages={1751--1754},
  doi={10.21437/Interspeech.2009-141}
}