The Biometric Vox System for the Albayzin-RTVE 2020 Speech-to-Text Challenge

Roberto Font, Teresa Grau

This paper describes the system developed by Biometric Vox for the Albayzin Speech-To-Text Challenge organized as part of the Iberspeech 2020 conference. The system uses speaker diarization to segment the audio into speaker-homogeneous segments and uses this information to compute speaker-dependent fMLLR transformed features. These speaker-adapted features are the input to a DNN acoustic model which is trained for the domain at hand using a semi-supervised self-training procedure. Finally, a RNN language model is used for lattice rescoring and producing the final transcription. Our system achieves 22% WER on the test portion of the RTVE2018 database and 30,26% on the 2020 evaluation set.

doi: 10.21437/IberSPEECH.2021-21

Font, R, Grau, T (2021) The Biometric Vox System for the Albayzin-RTVE 2020 Speech-to-Text Challenge. Proc. IberSPEECH 2021, 99-103, doi: 10.21437/IberSPEECH.2021-21.