The Seventh ISCA Tutorial and Research Workshop on Speech Synthesis

Kyoto, Japan
September 22-24, 2010

Composite TTS Voices

Alistair Conkie, Ann K. Syrdal

AT&T Labs – Research, Florham Park, NJ, USA

A new approach to synthetic voice generation and modification is described. One aspect of the approach is that no attempt is made to parametrize voices, unlike the commonly used Gaussian Mixture Model (GMM) paradigm and the newer eigenvoice techniques. Instead, a straightforward unit selection approach is adopted. A second aspect is that we systematically examine mixing units from different voices in a unit selection context. We present experimental results to show the effect of different voice mixing strategies. The modified voices we produce are high quality but do not have the full range of possibilities achievable using voice conversion.

Perceptual evaluations of voice similarity and paired comparison preference judgments of the synthetic voices were used to examine the importance of several features or classes of phones to perceived speaker identity.

Index Terms: voice conversion, speech synthesis, unit selection, voice similarity, speaker identity

Full Paper

Bibliographic reference.  Conkie, Alistair / Syrdal, Ann K. (2010): "Composite TTS voices", In SSW7-2010, 45-48.