An Investigation of Mixup Training Strategies for Acoustic Models in ASR

Ivan Medennikov, Yuri Khokhlov, Aleksei Romanenko, Dmitry Popov, Natalia Tomashenko, Ivan Sorokin, Alexander Zatvornitskiy


Mixup is a recently proposed technique that creates virtual training examples by combining existing ones. It has been successfully used in various machine learning tasks. This paper focuses on applying mixup to automatic speech recognition (ASR). More specifically, several strategies for acoustic model training are investigated, including both conventional cross-entropy and novel lattice-free MMI models. Considering mixup as a method of data augmentation as well as regularization, we compare it with widely used speed perturbation and dropout techniques. Experiments on Switchboard-1, AMI and TED-LIUM datasets shows consistent improvement of word error rate up to 13% relative. Moreover, mixup is found to be particularly effective on test data mismatched to the training data.


 DOI: 10.21437/Interspeech.2018-2191

Cite as: Medennikov, I., Khokhlov, Y., Romanenko, A., Popov, D., Tomashenko, N., Sorokin, I., Zatvornitskiy, A. (2018) An Investigation of Mixup Training Strategies for Acoustic Models in ASR. Proc. Interspeech 2018, 2903-2907, DOI: 10.21437/Interspeech.2018-2191.


@inproceedings{Medennikov2018,
  author={Ivan Medennikov and Yuri Khokhlov and Aleksei Romanenko and Dmitry Popov and Natalia Tomashenko and Ivan Sorokin and Alexander Zatvornitskiy},
  title={An Investigation of Mixup Training Strategies for Acoustic Models in ASR},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={2903--2907},
  doi={10.21437/Interspeech.2018-2191},
  url={http://dx.doi.org/10.21437/Interspeech.2018-2191}
}