Learning Word Embeddings: Unsupervised Methods for Fixed-size Representations of Variable-length Speech Segments

Nils Holzenberger, Mingxing Du, Julien Karadayi, Rachid Riad, Emmanuel Dupoux


Fixed-length embeddings of words are very useful for a variety of tasks in speech and language processing. Here we systematically explore two methods of computing fixed-length embeddings for variable-length sequences. We evaluate their susceptibility to phonetic and speaker-specific variability on English, a high resource language and Xitsonga, a low resource language, using two evaluation metrics: ABX word discrimination and ROC-AUC on same-different phoneme n-grams. We show that a simple downsampling method supplemented with length information can outperform the variable-length input feature representation on both evaluations. Recurrent autoencoders, trained without supervision, can yield even better results at the expense of increased computational complexity.


 DOI: 10.21437/Interspeech.2018-2364

Cite as: Holzenberger, N., Du, M., Karadayi, J., Riad, R., Dupoux, E. (2018) Learning Word Embeddings: Unsupervised Methods for Fixed-size Representations of Variable-length Speech Segments. Proc. Interspeech 2018, 2683-2687, DOI: 10.21437/Interspeech.2018-2364.


@inproceedings{Holzenberger2018,
  author={Nils Holzenberger and Mingxing Du and Julien Karadayi and Rachid Riad and Emmanuel Dupoux},
  title={Learning Word Embeddings: Unsupervised Methods for Fixed-size Representations of Variable-length Speech Segments},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={2683--2687},
  doi={10.21437/Interspeech.2018-2364},
  url={http://dx.doi.org/10.21437/Interspeech.2018-2364}
}