For speech synthesizers, enhanced diversity and improved quality of synthesized speech are required. Speaker interpolation and voice conversion are the techniques that enhance diversity. The PUSF (plural unit selection and fusion) method, which we have proposed, generates synthesized waveforms using pitch-cycle waveforms. However, it is difficult to modify its spectral features while keeping naturalness of synthesized speech. In the present work, we investigated how best to represent speech waveforms. Firstly, we introduce a method that decomposes a pitch waveform in a voiced portion into a periodic component, which is excited by vocal sound source, and an aperiodic component, which is excited by noise source. Moreover, we introduce the FWF (formant waveform) model to represent the periodic component. Because the FWF model represents the pitch waveform in accordance with formant parameters, it can control the formant parameters independently. We realized a method that can easily be applied to the diversity-enhancing techniques in the PUSF-based method because this model is based on vocal tract features.
Bibliographic reference. Morinaka, Ryo / Tamura, Masatsune / Morita, Masahiro / Kagoshima, Takehiko (2009): "Speech synthesis based on the plural unit selection and fusion method using FWF model", In INTERSPEECH-2009, 2083-2086.