We have proposed a novel probabilistic approach to concatenation modeling for corpus-based speech synthesis, where the goodness of concatenation for a unit is modeled using a conditional Gaussian probability density whose mean is defined as a linear transform of the feature vector from the previous unit. This approach has shown its effectiveness through a subjective listening test. In this paper, we further investigate the characteristics of the proposed method by a objective evaluation and by observing the sequence of concatenation scores across an utterance. We also present the mathematical relationships of the proposed method with other approaches and show that it has a flexible modeling power, having other approaches to concatenation scoring methods as special cases.
Bibliographic reference. Sakai, Shinsuke / Maia, Ranniery / Kawai, Hisashi / Nakamura, Satoshi (2009): "A close look into the probabilistic concatenation model for corpus-based speech synthesis", In INTERSPEECH-2009, 752-755.