Exploring E2E speech recognition systems for new languages

Conrad Bernath, Aitor Alvarez, Haritz Arzelus, Carlos David Martínez

Over the last few years, advances in both machine learning algorithms and computer hardware have led to significant improvements in speech recognition technology, mainly through the use of Deep Learning paradigms. As it was amply demonstrated in different studies, Deep Neural Networks (DNNs) have already outperformed traditional Gaussian Mixture Models (GMMs) at acoustic modeling in combination with Hidden Markov Models (HMMs). More recently, new attempts have focused on building end-to-end (E2E) speech recognition architectures, especially in languages with many resources like English and Chinese, with the aim of overcoming the performance of DNN-HMM and more conventional systems. The aim of this work is first to present the different techniques that have been applied to enhance state-of-the-art E2E systems for American English using publicly available datasets. Secondly, we describe the construction of E2E systems for Spanish and Basque, and explain the strategies applied to overcome the problem of the limited availability of training data, especially for Basque as a low-resource language. At the evaluation phase, the three E2E systems are also compared with DNN-HMM based recognition engines built and tested with the same datasets.

 DOI: 10.21437/IberSPEECH.2018-22

Cite as: Bernath, C., Alvarez, A., Arzelus, H., Martínez, C.D. (2018) Exploring E2E speech recognition systems for new languages. Proc. IberSPEECH 2018, 102-106, DOI: 10.21437/IberSPEECH.2018-22.

  author={Conrad Bernath and Aitor Alvarez and Haritz Arzelus and Carlos David Martínez},
  title={{Exploring E2E speech recognition systems for new languages}},
  booktitle={Proc. IberSPEECH 2018},