Sigma-UPM ASR Systems for the IberSpeech-RTVE 2020 Speech-to-Text Transcription Challenge

Juan M. Perero-Codosero, Fernando M. Espinoza-Cuadros, Luis A. Hernández-Gómez

This paper describes the Sigma-UPM Automatic Speech Recognition (ASR) systems submitted to IberSpeech-RTVE 2020 Speech-to-Text Transcription Challenge. Deep Neural Networks (DNN) are becoming the most promising technology for ASR at present. Since last few years, traditional hybrid models are being evaluated and compared to other end-to-end ASR systems in terms of accuracy and efficiency. In this Challenge, we contribute with two different approaches: a primary hybrid ASR system based on DNN-HMM and two contrastive state-of-the-art end-to-end ASR systems, based on lattice-free maximum mutual information (LF-MMI). Our analysis of the results from the last edition led us to conclude that some adaptation should be accomplished to improve the systems’ performance. In particular, data augmentation techniques and Domain Adversarial Training (DAT) have been applied to the aforementioned approaches. Experiments were carried out using 6 hours of dev1 and dev2 partitions from the RTVE2018 Database. Multi-condition data augmentation applied to our hybrid DNN-HMM models has demonstrated WER improvements in noisy scenarios (about 10% relative). In contrast, results obtained using an end-to-end Pychain-based ASR system are far from our expectations. Nevertheless, we found that when including DAT techniques a relative improvement, in terms of WER, of 2.87% was obtained when compared to the Pychain-based baseline system.

doi: 10.21437/IberSPEECH.2021-23

Perero-Codosero, J.M, Espinoza-Cuadros, F.M, Hernández-Gómez, L.A (2021) Sigma-UPM ASR Systems for the IberSpeech-RTVE 2020 Speech-to-Text Transcription Challenge. Proc. IberSPEECH 2021, 108-112, doi: 10.21437/IberSPEECH.2021-23.