Recently, a glottal vocoder has been integrated in the IBM concatenative
TTS system and certain configurable global voice transformations were
defined in the vocoder parameter space. The vocoder analysis employs
a novel robust glottal source parameter estimation strategy. The vocoder
is applied to the voiced speech only, while unvoiced speech is kept
unparameterized, thus contributing to the perceived naturalness of
the synthesized speech.
The semi-parametric system
enables independent modifications of the glottal source and vocal tract
components on-the-fly by embedding the voice transformations in the
synthesis process. The transformations effect ranges from slight voice
altering to a complete change of the perceived speaker personality.
Pitch modifications enhance these changes. At the same time, the voice
transformations are simple enough to be easily controlled externally
to the system. This allows the users either to fine tune the voice
sound or to create instantly multiple distinct virtual voices. In both
cases, the synthesis is based on a large and meticulously cleaned concatenative
TTS voice with a broad phonetic coverage. In this paper we present
the system and provide subjective evaluations of its voice modification
capabilities.
The technology presented in this paper is implemented in IBM Watson
TTS service.
Cite as: Sorin, A., Shechtman, S., Rendel, A. (2017) Semi Parametric Concatenative TTS with Instant Voice Modification Capabilities. Proc. Interspeech 2017, 1373-1377, doi: 10.21437/Interspeech.2017-1202
@inproceedings{sorin17_interspeech, author={Alexander Sorin and Slava Shechtman and Asaf Rendel}, title={{Semi Parametric Concatenative TTS with Instant Voice Modification Capabilities}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={1373--1377}, doi={10.21437/Interspeech.2017-1202} }