Unit selection text-to-speech systems currently produce very natural synthetic sentences by concatenating speech segments from a large database. Recently, increasing demand for designing high quality voices with less data creates need for further optimization of the textual corpus recorded by the speaker. The optimization process of this corpus is traditionally guided by the coverage rate of well-known units: triphones, words…. Such units are however not dedicated to concatenative speech synthesis; they are of general use in speech technologies and linguistics. In this paper, we describe a new unit which takes account of concatenative TTS own features: the "vocalic sandwich." Both an objective and a perceptual evaluation tend to show that vocalic sandwiches are appropriate units for corpus design.
Bibliographic reference. Cadic, Didier / Boidin, Cédric / d'Alessandro, Christophe (2009): "Vocalic sandwich, a unit designed for unit selection TTS", In INTERSPEECH-2009, 2079-2082.