This paper investigates a multi-speaker modeling technique with shared prior distributions and model structures for Bayesian speech synthesis. The quality of synthesized speech is improved by selecting appropriate model structures in HMM-based speech synthesis. Bayesian approach is known to work for such model selection. However, the result is strongly affected by prior distributions of model parameters. Therefore, determination of prior distributions and selection of model structures should be performed simultaneously. This paper investigates prior distributions and model structures in the situation where training data of multiple speakers are available. The prior distributions and model structures which represent acoustic features common to every speakers can be obtained by sharing them between multiple speaker-dependent models.
Bibliographic reference. Hashimoto, Kei / Nankaku, Yoshihiko / Tokuda, Keiichi (2011): "Multi-speaker modeling with shared prior distributions and model structures for Bayesian speech synthesis", In INTERSPEECH-2011, 113-116.