ISCA Archive IberSPEECH 2022
ISCA Archive IberSPEECH 2022

TID Spanish ASR system for the Albayzin 2022 Speech-to-Text Transcription Challenge

Fernando López, Jordi Luque

This paper describes Telef´onica I+D’s participation in the IberSPEECH-RTVE 2022 Speech-to-Text Transcription Challenge. We built an acoustic end-to-end Automatic Speech Recognition (ASR) based on the large XLS-R architecture. We first trained it with already aligned data from CommonVoice. After we adapted it to the TV broadcasting domain with a self-supervised method. For that purpose, we used an iterative pseudo-forced alignment algorithm fed with frame-wise character posteriors produced by our ASR. This allowed us to recover up to 166 hours from RTVE2018 and RTVE2022 databases. We additionally explored using a transformer-based seq2seq translator system as a Language Model (LM) to correct the transcripts of the acoustic ASR. Our best system achieved 24.27% WER in the test split of RTVE2020.


doi: 10.21437/IberSPEECH.2022-55

Cite as: López, F., Luque, J. (2022) TID Spanish ASR system for the Albayzin 2022 Speech-to-Text Transcription Challenge. Proc. IberSPEECH 2022, 271-275, doi: 10.21437/IberSPEECH.2022-55

@inproceedings{lopez22b_iberspeech,
  author={Fernando López and Jordi Luque},
  title={{TID Spanish ASR system for the Albayzin 2022 Speech-to-Text Transcription Challenge}},
  year=2022,
  booktitle={Proc. IberSPEECH 2022},
  pages={271--275},
  doi={10.21437/IberSPEECH.2022-55}
}