Speaker-Dependent WaveNet Vocoder

Akira Tamamori, Tomoki Hayashi, Kazuhiro Kobayashi, Kazuya Takeda, Tomoki Toda


In this study, we propose a speaker-dependent WaveNet vocoder, a method of synthesizing speech waveforms with WaveNet, by utilizing acoustic features from existing vocoder as auxiliary features of WaveNet. It is expected that WaveNet can learn a sample-by-sample correspondence between speech waveform and acoustic features. The advantage of the proposed method is that it does not require (1) explicit modeling of excitation signals and (2) various assumptions, which are based on prior knowledge specific to speech. We conducted both subjective and objective evaluation experiments on CMU-ARCTIC database. From the results of the objective evaluation, it was demonstrated that the proposed method could generate high-quality speech with phase information recovered, which was lost by a mel-cepstrum vocoder. From the results of the subjective evaluation, it was demonstrated that the sound quality of the proposed method was significantly improved from mel-cepstrum vocoder, and the proposed method could capture source excitation information more accurately.


 DOI: 10.21437/Interspeech.2017-314

Cite as: Tamamori, A., Hayashi, T., Kobayashi, K., Takeda, K., Toda, T. (2017) Speaker-Dependent WaveNet Vocoder. Proc. Interspeech 2017, 1118-1122, DOI: 10.21437/Interspeech.2017-314.


@inproceedings{Tamamori2017,
  author={Akira Tamamori and Tomoki Hayashi and Kazuhiro Kobayashi and Kazuya Takeda and Tomoki Toda},
  title={Speaker-Dependent WaveNet Vocoder},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={1118--1122},
  doi={10.21437/Interspeech.2017-314},
  url={http://dx.doi.org/10.21437/Interspeech.2017-314}
}