10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Vocalic Sandwich, a Unit Designed for Unit Selection TTS

Didier Cadic (1), Cédric Boidin (1), Christophe d'Alessandro (2)

(1) Orange Labs, France
(2) LIMSI, France

Unit selection text-to-speech systems currently produce very natural synthetic sentences by concatenating speech segments from a large database. Recently, increasing demand for designing high quality voices with less data creates need for further optimization of the textual corpus recorded by the speaker. The optimization process of this corpus is traditionally guided by the coverage rate of well-known units: triphones, words…. Such units are however not dedicated to concatenative speech synthesis; they are of general use in speech technologies and linguistics. In this paper, we describe a new unit which takes account of concatenative TTS own features: the "vocalic sandwich." Both an objective and a perceptual evaluation tend to show that vocalic sandwiches are appropriate units for corpus design.

Full Paper

Bibliographic reference.  Cadic, Didier / Boidin, Cédric / d'Alessandro, Christophe (2009): "Vocalic sandwich, a unit designed for unit selection TTS", In INTERSPEECH-2009, 2079-2082.