A Comparison of Recent Neural Vocoders for Speech Signal Reconstruction

Prachi Govalkar, Johannes Fischer, Frank Zalkow, Christian Dittmar

In recent years, text-to-speech (TTS) synthesis has benefited from advanced machine learning approaches. Most prominently, since the introduction of the WaveNet architecture, neural vocoders have exhibited superior performance in terms of the naturalness of synthesized speech signals in comparison to traditional vocoders. In this paper, a fair comparison of recent neural vocoders is presented in a signal reconstruction scenario. That means we use such techniques to resynthesize speech waveforms from mel-scaled spectrograms, a compact and generally non-invertible representation of the underlying audio signal. In that context, we conduct listening tests according to the well established MUSHRA standard and compare the attained results to similar studies. Weighing off the perceptual quality to the computational requirements, our findings shall serve as a guideline to both practitioners and researchers in speech synthesis.

 DOI: 10.21437/SSW.2019-2

Cite as: Govalkar, P., Fischer, J., Zalkow, F., Dittmar, C. (2019) A Comparison of Recent Neural Vocoders for Speech Signal Reconstruction. Proc. 10th ISCA Speech Synthesis Workshop, 7-12, DOI: 10.21437/SSW.2019-2.

  author={Prachi Govalkar and Johannes Fischer and Frank Zalkow and Christian Dittmar},
  title={{A Comparison of Recent Neural Vocoders for Speech Signal Reconstruction}},
  booktitle={Proc. 10th ISCA Speech Synthesis Workshop},