Combining Natural Gradient with Hessian Free Methods for Sequence Training

Adnan Haider, Philip Woodland


This paper presents a new optimisation approach to train Deep Neural Networks (DNNs) with discriminative sequence criteria. At each iteration, the method combines information from the Natural Gradient (NG) direction with local curvature information of the error surface that enables better paths on the parameter manifold to be traversed. The method has been applied within a Hessian Free (HF) style optimisation framework to sequence train both standard fully-connected DNNs and Time Delay Neural Networks as speech recognition acoustic models. The efficacy of the method is shown using experiments on a Multi-Genre Broadcast (MGB) transcription task and neural networks using sigmoid and ReLU activation functions have been investigated. It is shown that for the same number of updates this proposed approach achieves larger reductions in the word error rate (WER) than both NG and HF and also leads to a lower WER than standard stochastic gradient descent.


 DOI: 10.21437/Interspeech.2018-2335

Cite as: Haider, A., Woodland, P. (2018) Combining Natural Gradient with Hessian Free Methods for Sequence Training. Proc. Interspeech 2018, 2918-2922, DOI: 10.21437/Interspeech.2018-2335.


@inproceedings{Haider2018,
  author={Adnan Haider and Philip Woodland},
  title={Combining Natural Gradient with Hessian Free Methods for Sequence Training},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={2918--2922},
  doi={10.21437/Interspeech.2018-2335},
  url={http://dx.doi.org/10.21437/Interspeech.2018-2335}
}