Recognition of Latin American Spanish Using Multi-Task Learning

Carlos Mendes, Alberto Abad, João Paulo Neto, Isabel Trancoso


In the broadcast news domain, national wide newscasters typically interact with communities with a diverse set of accents. One of the challenges in speech recognition is the performance degradation in the presence of these diverse conditions. Performance further aggravates when the accents are from other countries that share the same language. Extensive work has been conducted in this topic for languages such as English and Mandarin. Recently, TDNN based multi-task learning has received some attention in this area, with interesting results, typically using models trained with a variety of different accented corpora from a particular language. In this work, we look at the case of LATAM (Latin American) Spanish for its unique and distinctive accent variations. Because LATAM Spanish has historically been influenced by non-Spanish European migrations, we anticipated that LATAM based speech recognition performance can be further improved by including these influential languages, during a TDNN based multi-task training. Experiments show that including such languages in the training setup outperforms the single task acoustic model baseline. We also propose an automatic per-language weight selection strategy to regularize each language contribution during multi-task training.


 DOI: 10.21437/Interspeech.2019-2772

Cite as: Mendes, C., Abad, A., Neto, J.P., Trancoso, I. (2019) Recognition of Latin American Spanish Using Multi-Task Learning. Proc. Interspeech 2019, 2135-2139, DOI: 10.21437/Interspeech.2019-2772.


@inproceedings{Mendes2019,
  author={Carlos Mendes and Alberto Abad and João Paulo Neto and Isabel Trancoso},
  title={{Recognition of Latin American Spanish Using Multi-Task Learning}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={2135--2139},
  doi={10.21437/Interspeech.2019-2772},
  url={http://dx.doi.org/10.21437/Interspeech.2019-2772}
}