ISCA Archive Interspeech 2021
ISCA Archive Interspeech 2021

Evaluating the Extrapolation Capabilities of Neural Vocoders to Extreme Pitch Values

Olivier Perrotin, Hussein El Amouri, Gérard Bailly, Thomas Hueber

Neural vocoders are systematically evaluated on homogeneous train and test databases. This kind of evaluation is efficient to compare neural vocoders in their “comfort zone”, yet it hardly reveals their limits towards unseen data during training. To compare their extrapolation capabilities, we introduce a methodology that aims at quantifying the robustness of neural vocoders in synthesising unseen data, by precisely controlling the ranges of seen/unseen data in the training database. By focusing in this study on the pitch (F0) parameter, our methodology involves a careful splitting of a dataset to control which F0 values are seen/unseen during training, followed by both global (utterance) and local (frame) evaluation of vocoders. Comparison of four types of vocoders (autoregressive, sourcefilter, flows, GAN) displays a wide range of behaviour towards unseen input pitch values, including excellent extrapolation (WaveGlow); widely-spread F0 errors (WaveRNN); and systematic generation of the training set median F0 (LPCNet, Parallel WaveGAN). In contrast, fewer differences between vocoders were observed when using homogeneous train and test sets, thus demonstrating the potential and need for such evaluation to better discriminate the neural vocoders abilities to generate out-of-training-range data.


doi: 10.21437/Interspeech.2021-1547

Cite as: Perrotin, O., Amouri, H.E., Bailly, G., Hueber, T. (2021) Evaluating the Extrapolation Capabilities of Neural Vocoders to Extreme Pitch Values. Proc. Interspeech 2021, 11-15, doi: 10.21437/Interspeech.2021-1547

@inproceedings{perrotin21_interspeech,
  author={Olivier Perrotin and Hussein El Amouri and Gérard Bailly and Thomas Hueber},
  title={{Evaluating the Extrapolation Capabilities of Neural Vocoders to Extreme Pitch Values}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={11--15},
  doi={10.21437/Interspeech.2021-1547}
}