ISCA Archive Interspeech 2017
ISCA Archive Interspeech 2017

Backstitch: Counteracting Finite-Sample Bias via Negative Steps

Yiming Wang, Vijayaditya Peddinti, Hainan Xu, Xiaohui Zhang, Daniel Povey, Sanjeev Khudanpur

In this paper we describe a modification to Stochastic Gradient Descent (SGD) that improves generalization to unseen data. It consists of doing two steps for each minibatch: a backward step with a small negative learning rate, followed by a forward step with a larger learning rate. The idea was initially inspired by ideas from adversarial training, but we show that it can be viewed as a crude way of canceling out certain systematic biases that come from training on finite data sets. The method gives ~ 10% relative improvement over our best acoustic models based on lattice-free MMI, across multiple datasets with 100–300 hours of data.

doi: 10.21437/Interspeech.2017-1323

Cite as: Wang, Y., Peddinti, V., Xu, H., Zhang, X., Povey, D., Khudanpur, S. (2017) Backstitch: Counteracting Finite-Sample Bias via Negative Steps. Proc. Interspeech 2017, 1631-1635, doi: 10.21437/Interspeech.2017-1323

  author={Yiming Wang and Vijayaditya Peddinti and Hainan Xu and Xiaohui Zhang and Daniel Povey and Sanjeev Khudanpur},
  title={{Backstitch: Counteracting Finite-Sample Bias via Negative Steps}},
  booktitle={Proc. Interspeech 2017},