10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

A Close Look into the Probabilistic Concatenation Model for Corpus-Based Speech Synthesis

Shinsuke Sakai, Ranniery Maia, Hisashi Kawai, Satoshi Nakamura

NICT, Japan

We have proposed a novel probabilistic approach to concatenation modeling for corpus-based speech synthesis, where the goodness of concatenation for a unit is modeled using a conditional Gaussian probability density whose mean is defined as a linear transform of the feature vector from the previous unit. This approach has shown its effectiveness through a subjective listening test. In this paper, we further investigate the characteristics of the proposed method by a objective evaluation and by observing the sequence of concatenation scores across an utterance. We also present the mathematical relationships of the proposed method with other approaches and show that it has a flexible modeling power, having other approaches to concatenation scoring methods as special cases.

Full Paper

Bibliographic reference.  Sakai, Shinsuke / Maia, Ranniery / Kawai, Hisashi / Nakamura, Satoshi (2009): "A close look into the probabilistic concatenation model for corpus-based speech synthesis", In INTERSPEECH-2009, 752-755.