ISCA Archive Interspeech 2021
ISCA Archive Interspeech 2021

A Benchmark of Dynamical Variational Autoencoders Applied to Speech Spectrogram Modeling

Xiaoyu Bie, Laurent Girin, Simon Leglaive, Thomas Hueber, Xavier Alameda-Pineda

The Variational Autoencoder (VAE) is a powerful deep generative model that is now extensively used to represent high-dimensional complex data via a low-dimensional latent space learned in an unsupervised manner. In the original VAE model, input data vectors are processed independently. In recent years, a series of papers have presented different extensions of the VAE to process sequential data, that not only model the latent space, but also model the temporal dependencies within a sequence of data vectors and corresponding latent vectors, relying on recurrent neural networks. We recently performed a comprehensive review of those models and unified them into a general class called Dynamical Variational Autoencoders (DVAEs). In the present paper, we present the results of an experimental benchmark comparing six of those DVAE models on the speech analysis-resynthesis task, as an illustration of the high potential of DVAEs for speech modeling.


doi: 10.21437/Interspeech.2021-256

Cite as: Bie, X., Girin, L., Leglaive, S., Hueber, T., Alameda-Pineda, X. (2021) A Benchmark of Dynamical Variational Autoencoders Applied to Speech Spectrogram Modeling. Proc. Interspeech 2021, 46-50, doi: 10.21437/Interspeech.2021-256

@inproceedings{bie21_interspeech,
  author={Xiaoyu Bie and Laurent Girin and Simon Leglaive and Thomas Hueber and Xavier Alameda-Pineda},
  title={{A Benchmark of Dynamical Variational Autoencoders Applied to Speech Spectrogram Modeling}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={46--50},
  doi={10.21437/Interspeech.2021-256}
}