Bottleneck and Embedding Representation of Speech for DNN-based Language and Speaker Recognition

Alicia Lozano-Diez, Joaquin Gonzalez-Rodriguez, Javier Gonzalez-Dominguez


In this manuscript, we summarize the findings presented in Alicia Lozano Diez's Ph.D. Thesis, defended on the 22nd of June, 2018 in Universidad Autonoma de Madrid (Spain). In particular, this Ph.D. Thesis explores different approaches to the tasks of language and speaker recognition, focusing on systems where deep neural networks (DNNs) become part of traditional pipelines, replacing some stages or the whole system itself. First, we present a DNN as classifier for the task of language recognition. Second, we analyze the use of DNNs for feature extraction at frame-level, the so-called bottleneck features, for both language and speaker recognition. Finally, utterance-level representation of the speech segments learned by the DNN (known as embedding) is described and presented for the task of language recognition. All these approaches provide alternatives to classical language and speaker recognition systems based on i-vectors (Total Variability modeling) over acoustic features (MFCCs, for instance). Moreover, they usually yield better results in terms of performance.


 DOI: 10.21437/IberSPEECH.2018-36

Cite as: Lozano-Diez, A., Gonzalez-Rodriguez, J., Gonzalez-Dominguez, J. (2018) Bottleneck and Embedding Representation of Speech for DNN-based Language and Speaker Recognition. Proc. IberSPEECH 2018, 179-183, DOI: 10.21437/IberSPEECH.2018-36.


@inproceedings{Lozano-Diez2018,
  author={Alicia Lozano-Diez and Joaquin Gonzalez-Rodriguez and Javier Gonzalez-Dominguez},
  title={{Bottleneck and Embedding Representation of Speech for DNN-based Language and Speaker Recognition}},
  year=2018,
  booktitle={Proc. IberSPEECH 2018},
  pages={179--183},
  doi={10.21437/IberSPEECH.2018-36},
  url={http://dx.doi.org/10.21437/IberSPEECH.2018-36}
}