ISCA Archive IberSPEECH 2021
ISCA Archive IberSPEECH 2021

The Biometric Vox System for the Albayzin-RTVE 2020 Speech-to-Text Challenge

Roberto Font, Teresa Grau

This paper describes the system developed by Biometric Vox for the Albayzin Speech-To-Text Challenge organized as part of the Iberspeech 2020 conference. The system uses speaker diarization to segment the audio into speaker-homogeneous segments and uses this information to compute speaker-dependent fMLLR transformed features. These speaker-adapted features are the input to a DNN acoustic model which is trained for the domain at hand using a semi-supervised self-training procedure. Finally, a RNN language model is used for lattice rescoring and producing the final transcription. Our system achieves 22% WER on the test portion of the RTVE2018 database and 30,26% on the 2020 evaluation set.


doi: 10.21437/IberSPEECH.2021-21

Cite as: Font, R., Grau, T. (2021) The Biometric Vox System for the Albayzin-RTVE 2020 Speech-to-Text Challenge. Proc. IberSPEECH 2021, 99-103, doi: 10.21437/IberSPEECH.2021-21

@inproceedings{font21_iberspeech,
  author={Roberto Font and Teresa Grau},
  title={{The Biometric Vox System for the Albayzin-RTVE 2020 Speech-to-Text Challenge}},
  year=2021,
  booktitle={Proc. IberSPEECH 2021},
  pages={99--103},
  doi={10.21437/IberSPEECH.2021-21}
}