ISCA Archive SSW 2023
ISCA Archive SSW 2023

Exploring the multidimensional representation of individual speech acoustic parameters extracted by deep unsupervised models

Maxime Jacquelin, Maeva Garnier, Laurent Girin, Rémy Vincent, Olivier Perrotin

Understanding latent representations of speech by unsupervised models enables powerful signal analysis, transformation, and generation. Numerous studies have identified directions of variation of acoustic features such as fundamental frequency or formants in unsupervised models latent spaces, but it is yet not well understood why the variation of such one-dimensional features is often explained by multiple latent dimensions. This paper proposes a methodology for interpreting these dimensions, in the latent space of a variational autoencoder trained on a multi-speaker database.


Cite as: Jacquelin, M., Garnier, M., Girin, L., Vincent, R., Perrotin, O. (2023) Exploring the multidimensional representation of individual speech acoustic parameters extracted by deep unsupervised models . Proc. 12th ISCA Speech Synthesis Workshop (SSW2023), 240-241

@inproceedings{jacquelin23_ssw,
  author={Maxime Jacquelin and Maeva Garnier and Laurent Girin and Rémy Vincent and Olivier Perrotin},
  title={{Exploring the multidimensional representation of individual speech acoustic parameters extracted by deep unsupervised models }},
  year=2023,
  booktitle={Proc. 12th ISCA Speech Synthesis Workshop (SSW2023)},
  pages={240--241}
}