10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

A Bayesian Approach to Hidden Semi-Markov Model Based Speech Synthesis

Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda

Nagoya Institute of Technology, Japan

This paper proposes a Bayesian approach to hidden semi-Markov model (HSMM) based speech synthesis. Recently, hidden Markov model (HMM) based speech synthesis based on the Bayesian approach was proposed. The Bayesian approach is a statistical technique for estimating reliable predictive distributions by treating model parameters as random variables. In the Bayesian approach, all processes for constructing the system are derived from one single predictive distribution which exactly represents the problem of speech synthesis. However, there is an inconsistency between training and synthesis: although the speech is synthesized from HMMs with explicit state duration probability distributions, HMMs are trained without them. In this paper, we introduce an HSMM, which is an HMM with explicit state duration probability distributions, into the HMM-based Bayesian speech synthesis system. Experimental results show that the use of HSMM improves the naturalness of the synthesized speech.

Full Paper

Bibliographic reference.  Hashimoto, Kei / Nankaku, Yoshihiko / Tokuda, Keiichi (2009): "A Bayesian approach to Hidden Semi-Markov Model based speech synthesis", In INTERSPEECH-2009, 1751-1754.