Speaker Adaptive Training and Mixup Regularization for Neural Network Acoustic Models in Automatic Speech Recognition

Natalia Tomashenko, Yuri Khokhlov, Yannick Estève


This work investigates speaker adaptation and regularization techniques for deep neural network acoustic models (AMs) in automatic speech recognition (ASR) systems. In previous works, GMM-derived (GMMD) features have been shown to be an efficient technique for neural network AM adaptation. In this paper, we propose and investigate a novel way to improve speaker adaptive training (SAT) for neural network AMs using GMMD features. The idea is based on using inaccurate transcriptions from ASR for adaptation during neural network training, while keeping the exact transcriptions for targets of neural networks. In addition, we apply a mixup technique, recently proposed for classification tasks, to acoustic models for ASR and investigate the impact of this technique on speaker adapted acoustic models. Experimental results on the TED-LIUM corpus show that the proposed approaches provide an additional gain in speech recognition performance in comparison with the speaker adapted AMs.


 DOI: 10.21437/Interspeech.2018-2209

Cite as: Tomashenko, N., Khokhlov, Y., Estève, Y. (2018) Speaker Adaptive Training and Mixup Regularization for Neural Network Acoustic Models in Automatic Speech Recognition. Proc. Interspeech 2018, 2414-2418, DOI: 10.21437/Interspeech.2018-2209.


@inproceedings{Tomashenko2018,
  author={Natalia Tomashenko and Yuri Khokhlov and Yannick Estève},
  title={Speaker Adaptive Training and Mixup Regularization for Neural Network Acoustic Models in Automatic Speech Recognition},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={2414--2418},
  doi={10.21437/Interspeech.2018-2209},
  url={http://dx.doi.org/10.21437/Interspeech.2018-2209}
}