LF-MMI Training of Bayesian and Gaussian Process Time Delay Neural Networks for Speech Recognition

Shoukang Hu, Xurong Xie, Shansong Liu, Max W.Y. Lam, Jianwei Yu, Xixin Wu, Xunying Liu, Helen Meng


Discriminative training techniques define state-of-the-art performance for deep neural networks (DNNs) based speech recognition systems across a wide range of tasks. Conventional discriminative training methods produce deterministic DNN parameter estimates. They are inherently prone to overfitting, leading to poor generalization when given limited training data. In order to address this issue, this paper investigates the use of Bayesian learning and Gaussian Process (GP) based hidden activations to replace the deterministic parameter estimates of standard lattice-free maximum mutual information (LF-MMI) criterion trained time delay neural network (TDNN) acoustic models. Experiments conducted on the Switchboard conversational telephone speech recognition tasks suggest the proposed technique consistently outperforms the baseline LF-MMI trained TDNN systems using fixed parameter hidden activations.


 DOI: 10.21437/Interspeech.2019-2379

Cite as: Hu, S., Xie, X., Liu, S., Lam, M.W., Yu, J., Wu, X., Liu, X., Meng, H. (2019) LF-MMI Training of Bayesian and Gaussian Process Time Delay Neural Networks for Speech Recognition. Proc. Interspeech 2019, 2793-2797, DOI: 10.21437/Interspeech.2019-2379.


@inproceedings{Hu2019,
  author={Shoukang Hu and Xurong Xie and Shansong Liu and Max W.Y. Lam and Jianwei Yu and Xixin Wu and Xunying Liu and Helen Meng},
  title={{LF-MMI Training of Bayesian and Gaussian Process Time Delay Neural Networks for Speech Recognition}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={2793--2797},
  doi={10.21437/Interspeech.2019-2379},
  url={http://dx.doi.org/10.21437/Interspeech.2019-2379}
}