Hidden Markov Model Variational Autoencoder for Acoustic Unit Discovery

Janek Ebbers, Jahn Heymann, Lukas Drude, Thomas Glarner, Reinhold Haeb-Umbach, Bhiksha Raj


Variational Autoencoders (VAEs) have been shown to provide efficient neural-network-based approximate Bayesian inference for observation models for which exact inference is intractable. Its extension, the so-called Structured VAE (SVAE) allows inference in the presence of both discrete and continuous latent variables. Inspired by this extension, we developed a VAE with Hidden Markov Models (HMMs) as latent models. We applied the resulting HMM-VAE to the task of acoustic unit discovery in a zero resource scenario. Starting from an initial model based on variational inference in an HMM with Gaussian Mixture Model (GMM) emission probabilities, the accuracy of the acoustic unit discovery could be significantly improved by the HMM-VAE. In doing so we were able to demonstrate for an unsupervised learning task what is well-known in the supervised learning case: Neural networks provide superior modeling power compared to GMMs.


 DOI: 10.21437/Interspeech.2017-1160

Cite as: Ebbers, J., Heymann, J., Drude, L., Glarner, T., Haeb-Umbach, R., Raj, B. (2017) Hidden Markov Model Variational Autoencoder for Acoustic Unit Discovery. Proc. Interspeech 2017, 488-492, DOI: 10.21437/Interspeech.2017-1160.


@inproceedings{Ebbers2017,
  author={Janek Ebbers and Jahn Heymann and Lukas Drude and Thomas Glarner and Reinhold Haeb-Umbach and Bhiksha Raj},
  title={Hidden Markov Model Variational Autoencoder for Acoustic Unit Discovery},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={488--492},
  doi={10.21437/Interspeech.2017-1160},
  url={http://dx.doi.org/10.21437/Interspeech.2017-1160}
}