ISCA Archive SSW 2010
ISCA Archive SSW 2010

Joint prosodic and segmental unit selection for expressive speech synthesis

Christophe Veaux, Pierre Lanchantin, Xavier Rodet

One problem in concatenative speech synthesis is how to incorporate prosodic factors in the unit selection. Imposing a predicted prosodic contour as target specification is errorprone and does not benefit from the natural variability contained in the database. This paper introduces a method that searches for the optimal unit sequence by maximizing a joint likelihood at both segmental and prosodic level. At the segmental level, the concatenation cost and target cost are reformulated in terms of conditional and a priori probabilities which are combined with probabilistic models of fundamental frequency and duration at the syllable level and the phrase level. A generalized version of the Viterbi algorithm is used to take into account the long-term dependencies introduced by the prosodic models during the search of the optimal unit sequence. This method has been implemented in a unit selection synthesizer using an expressive speech database and a subjective evaluation shows an improvement in the prosodic quality, although the overall quality is only slightly enhanced.

Index Terms: speech synthesis, unit selection, prosody


Cite as: Veaux, C., Lanchantin, P., Rodet, X. (2010) Joint prosodic and segmental unit selection for expressive speech synthesis. Proc. 7th ISCA Workshop on Speech Synthesis (SSW 7), 323-327

@inproceedings{veaux10_ssw,
  author={Christophe Veaux and Pierre Lanchantin and Xavier Rodet},
  title={{Joint prosodic and segmental unit selection for expressive speech synthesis}},
  year=2010,
  booktitle={Proc. 7th ISCA Workshop on Speech Synthesis (SSW 7)},
  pages={323--327}
}