ISCA Archive IberSPEECH 2022
ISCA Archive IberSPEECH 2022

Speaker Characterization by means of Attention Pooling

Federico Costa, Miquel India, Javier Hernando

State-of-the-art Deep Learning systems for speaker verification are commonly based on speaker embedding extractors. These architectures are usually composed of a feature extractor front-end together with a pooling layer to encode variablelength utterances into fixed-length speaker vectors. The authors have recently proposed the use of a Double Multi-Head SelfAttention pooling for speaker recognition, placed between a CNN-based front-end and a set of fully connected layers. This has shown to be an excellent approach to efficiently select the most relevant features captured by the front-end from the speech signal. In this paper we show excellent experimental results by adapting this architecture to other different speaker characterization tasks, such as emotion recognition, sex classification and COVID-19 detection.

doi: 10.21437/IberSPEECH.2022-34

Cite as: Costa, F., India, M., Hernando, J. (2022) Speaker Characterization by means of Attention Pooling . Proc. IberSPEECH 2022, 166-170, doi: 10.21437/IberSPEECH.2022-34

  author={Federico Costa and Miquel India and Javier Hernando},
  title={{Speaker Characterization by means of Attention Pooling }},
  booktitle={Proc. IberSPEECH 2022},