Interspeech'2005 - Eurospeech
In this paper, we present a novel statistical approach to corpusbased speech synthesis. Unit selection is directed by probabilistic models for F0 contour, duration, and spectral characteristics of the synthesis units. The F0 targets for units are modeled by statistical additive models, and duration targets are modeled by regression trees. Spectral targets for a unit is modeled by Gaussian mixtures on MFCC-based features. Goodness of concatenation of two units is modeled by conditional Gaussian models on MFCC-based features. Although the system is in its early stage of development, we implemented an English speech synthesizer with CMU Arctic corpora and confirmed the effectiveness of this new framework.
Bibliographic reference. Sakai, Shinsuke / Shu, Han (2005): "A probabilistic approach to unit selection for corpus-based speech synthesis", In INTERSPEECH-2005, 81-84.