Backstitch: Counteracting Finite-Sample Bias via Negative Steps

Yiming Wang, Vijayaditya Peddinti, Hainan Xu, Xiaohui Zhang, Daniel Povey, Sanjeev Khudanpur


In this paper we describe a modification to Stochastic Gradient Descent (SGD) that improves generalization to unseen data. It consists of doing two steps for each minibatch: a backward step with a small negative learning rate, followed by a forward step with a larger learning rate. The idea was initially inspired by ideas from adversarial training, but we show that it can be viewed as a crude way of canceling out certain systematic biases that come from training on finite data sets. The method gives ~ 10% relative improvement over our best acoustic models based on lattice-free MMI, across multiple datasets with 100–300 hours of data.


 DOI: 10.21437/Interspeech.2017-1323

Cite as: Wang, Y., Peddinti, V., Xu, H., Zhang, X., Povey, D., Khudanpur, S. (2017) Backstitch: Counteracting Finite-Sample Bias via Negative Steps. Proc. Interspeech 2017, 1631-1635, DOI: 10.21437/Interspeech.2017-1323.


@inproceedings{Wang2017,
  author={Yiming Wang and Vijayaditya Peddinti and Hainan Xu and Xiaohui Zhang and Daniel Povey and Sanjeev Khudanpur},
  title={Backstitch: Counteracting Finite-Sample Bias via Negative Steps},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={1631--1635},
  doi={10.21437/Interspeech.2017-1323},
  url={http://dx.doi.org/10.21437/Interspeech.2017-1323}
}