ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Statistical nonparametric speech synthesis using sparse Gaussian processes

Tomoki Koriyama, Takashi Nose, Takao Kobayashi

This paper proposes a statistical nonparametric speech synthesis technique based on a sparse Gaussian process regression (GPR). In our previous study, we proposed GPR-based speech synthesis where each frame of synthesis units is modeled by a regression of Gaussian processes. Preliminary experiments of synthesizing several phones including both vowels and consonants showed a potential of the technique. In this paper, the previous work is extended to full-sentence speech synthesis using sparse GPs and context modification. Specifically, cluster-based sparse Gaussian processes such as local GPs and partially independent conditional (PIC) approximation are examined as a computationally feasible approach. Moreover, frame-level context is extended to include not only a position context from a current phone but also adjacent phones to generate smoothly changing speech parameters. Objective and subjective evaluation results show that the proposed technique outperforms the HMM-based speech synthesis with minimum generation error training.

doi: 10.21437/Interspeech.2013-121

Cite as: Koriyama, T., Nose, T., Kobayashi, T. (2013) Statistical nonparametric speech synthesis using sparse Gaussian processes. Proc. Interspeech 2013, 1072-1076, doi: 10.21437/Interspeech.2013-121

  author={Tomoki Koriyama and Takashi Nose and Takao Kobayashi},
  title={{Statistical nonparametric speech synthesis using sparse Gaussian processes}},
  booktitle={Proc. Interspeech 2013},