In this paper we describe a modification to Stochastic Gradient Descent (SGD) that improves generalization to unseen data. It consists of doing two steps for each minibatch: a backward step with a small negative learning rate, followed by a forward step with a larger learning rate. The idea was initially inspired by ideas from adversarial training, but we show that it can be viewed as a crude way of canceling out certain systematic biases that come from training on finite data sets. The method gives ~ 10% relative improvement over our best acoustic models based on lattice-free MMI, across multiple datasets with 100–300 hours of data.
Cite as: Wang, Y., Peddinti, V., Xu, H., Zhang, X., Povey, D., Khudanpur, S. (2017) Backstitch: Counteracting Finite-Sample Bias via Negative Steps. Proc. Interspeech 2017, 1631-1635, doi: 10.21437/Interspeech.2017-1323
@inproceedings{wang17h_interspeech, author={Yiming Wang and Vijayaditya Peddinti and Hainan Xu and Xiaohui Zhang and Daniel Povey and Sanjeev Khudanpur}, title={{Backstitch: Counteracting Finite-Sample Bias via Negative Steps}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={1631--1635}, doi={10.21437/Interspeech.2017-1323} }