ISCA Archive Interspeech 2006
ISCA Archive Interspeech 2006

Decision tree-based training of probabilistic concatenation models for corpus-based speech synthesis

Shinsuke Sakai, Tatsuya Kawahara

The measure of the goodness, or cost, of concatenating synthesis units plays an important role in concatenative speech synthesis. In this paper, we present a probabilistic approach to concatenation modeling in which the goodness of concatenation is represented as the conditional probability of observing the spectral shape of a unit given the previous unit and the current phonetic context. This conditional probability is modeled by a conditional Gaussian density whose mean vector has a form of linear transform of the past spectral shape. A phonetic decision-tree based parameter tying is performed to achieve a robust training that balances between model complexity and the amount of training data available. The concatenation models are implemented in a corpus-based speech synthesizer trained with a CMU Arctic database and the effectiveness of the proposed method was confirmed by a subjective listening test.


doi: 10.21437/Interspeech.2006-484

Cite as: Sakai, S., Kawahara, T. (2006) Decision tree-based training of probabilistic concatenation models for corpus-based speech synthesis. Proc. Interspeech 2006, paper 1564-Wed2A3O.2, doi: 10.21437/Interspeech.2006-484

@inproceedings{sakai06_interspeech,
  author={Shinsuke Sakai and Tatsuya Kawahara},
  title={{Decision tree-based training of probabilistic concatenation models for corpus-based speech synthesis}},
  year=2006,
  booktitle={Proc. Interspeech 2006},
  pages={paper 1564-Wed2A3O.2},
  doi={10.21437/Interspeech.2006-484}
}