INTERSPEECH 2006 - ICSLP
The measure of the goodness, or cost, of concatenating synthesis units plays an important role in concatenative speech synthesis. In this paper, we present a probabilistic approach to concatenation modeling in which the goodness of concatenation is represented as the conditional probability of observing the spectral shape of a unit given the previous unit and the current phonetic context. This conditional probability is modeled by a conditional Gaussian density whose mean vector has a form of linear transform of the past spectral shape. A phonetic decision-tree based parameter tying is performed to achieve a robust training that balances between model complexity and the amount of training data available. The concatenation models are implemented in a corpus-based speech synthesizer trained with a CMU Arctic database and the effectiveness of the proposed method was confirmed by a subjective listening test.
Bibliographic reference. Sakai, Shinsuke / Kawahara, Tatsuya (2006): "Decision tree-based training of probabilistic concatenation models for corpus-based speech synthesis", In INTERSPEECH-2006, paper 1564-Wed2A3O.2.