ISCA Archive IberSPEECH 2022
ISCA Archive IberSPEECH 2022

The Vicomtech-UPM Speech Transcription Systems for the Albayzín-RTVE 2022 Speech to Text Transcription Challenge

Haritz Arzelus, Iván G. Torres, Juan Manuel Martín-Doñas, Ander González-Docasal, Aitor Alvarez

This paper describes the Vicomtech-UPM submission to the Albayz´ın-RTVE 2022 Speech to Text Transcription Challenge, which calls for automatic speech transcription systems to be evaluated in realistic TV shows. A total of 4 systems were built and presented to the evaluation challenge, considering the primary system alongside three contrastive systems. Each system was built on top of one different architecture, with the aim of testing several state-of-the-art modelling approaches focused on different learning techniques and typologies of neural networks. The primary system used the self-supervised Wav2vec2.0 model as the pre-trained model of the transcription engine. This model was fine-tuned with in-domain labelled data and the initial hypothesis re-scored with a pruned 4-gram based language model. The first contrastive system corresponds to a pruned RNN-Transducer model, composed of a Conformer encoder and a stateless prediction network using BPE word-pieces as output symbols. As the second contrastive system, we built a Multistream-CNN acoustic model based system with a nonpruned 3-gram model for decoding, and a RNN based language model for rescoring the initial lattices. Finally, results obtained with the publicly available Large model of the recently published Whisper engine were also presented within the third contrastive system, with the aim of serving as a reference benchmark for other engines. Along with the description of the systems, the results obtained on the Albayzin-RTVE 2020 and 2022 test sets by each engine are presented as well.


doi: 10.21437/IberSPEECH.2022-54

Cite as: Arzelus, H., Torres, I.G., Martín-Doñas, J.M., González-Docasal, A., Alvarez, A. (2022) The Vicomtech-UPM Speech Transcription Systems for the Albayzín-RTVE 2022 Speech to Text Transcription Challenge. Proc. IberSPEECH 2022, 266-270, doi: 10.21437/IberSPEECH.2022-54

@inproceedings{arzelus22_iberspeech,
  author={Haritz Arzelus and Iván G. Torres and Juan Manuel Martín-Doñas and Ander González-Docasal and Aitor Alvarez},
  title={{The Vicomtech-UPM Speech Transcription Systems for the Albayzín-RTVE 2022 Speech to Text Transcription Challenge}},
  year=2022,
  booktitle={Proc. IberSPEECH 2022},
  pages={266--270},
  doi={10.21437/IberSPEECH.2022-54}
}