ISCA Archive SSW 2010
ISCA Archive SSW 2010

Composite TTS voices

Alistair Conkie, Ann K. Syrdal

A new approach to synthetic voice generation and modification is described. One aspect of the approach is that no attempt is made to parametrize voices, unlike the commonly used Gaussian Mixture Model (GMM) paradigm and the newer eigenvoice techniques. Instead, a straightforward unit selection approach is adopted. A second aspect is that we systematically examine mixing units from different voices in a unit selection context. We present experimental results to show the effect of different voice mixing strategies. The modified voices we produce are high quality but do not have the full range of possibilities achievable using voice conversion.

Perceptual evaluations of voice similarity and paired comparison preference judgments of the synthetic voices were used to examine the importance of several features or classes of phones to perceived speaker identity.

Index Terms: voice conversion, speech synthesis, unit selection, voice similarity, speaker identity

Cite as: Conkie, A., Syrdal, A.K. (2010) Composite TTS voices. Proc. 7th ISCA Workshop on Speech Synthesis (SSW 7), 45-48

  author={Alistair Conkie and Ann K. Syrdal},
  title={{Composite TTS voices}},
  booktitle={Proc. 7th ISCA Workshop on Speech Synthesis (SSW 7)},