INTERSPEECH 2011
12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis

Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda

Nagoya Institute of Technology, Japan

This paper investigates a multi-speaker modeling technique with shared prior distributions and model structures for Bayesian speech synthesis. The quality of synthesized speech is improved by selecting appropriate model structures in HMM-based speech synthesis. Bayesian approach is known to work for such model selection. However, the result is strongly affected by prior distributions of model parameters. Therefore, determination of prior distributions and selection of model structures should be performed simultaneously. This paper investigates prior distributions and model structures in the situation where training data of multiple speakers are available. The prior distributions and model structures which represent acoustic features common to every speakers can be obtained by sharing them between multiple speaker-dependent models.

Full Paper

Bibliographic reference.  Hashimoto, Kei / Nankaku, Yoshihiko / Tokuda, Keiichi (2011): "Multi-speaker modeling with shared prior distributions and model structures for Bayesian speech synthesis", In INTERSPEECH-2011, 113-116.