Gaussian Process Neural Networks for Speech Recognition

Max W. Y. Lam, Shoukang Hu, Xurong Xie, Shansong Liu, Jianwei Yu, Rongfeng Su, Xunying Liu, Helen Meng


Deep neural networks (DNNs) play an important role in state-of-the-art speech recognition systems. One important issue associated with DNNs and artificial neural networks in general is the selection of suitable model structures, for example, the form of hidden node activation functions to use. Due to lack of automatic model selection techniques, the choice of activation functions has been largely empirically based. In addition, the use of deterministic, fixed-point parameter estimates is prone to over-fitting when given limited training data. In order to model both models’ structural and parametric uncertainty, a novel form of DNN architecture using non-parametric activation functions based on Gaussian process (GP), Gaussian process neural networks (GPNN), is proposed in this paper. Initial experiments conducted on the ARPA Resource Management task suggest that the proposed GPNN acoustic models outperformed the baseline sigmoid activation based DNN by 3.40% to 24.25% relatively in terms of word error rate. Consistent performance improvements over the DNN baseline were also obtained by varying the number of hidden nodes and the number of spectral basis functions.


 DOI: 10.21437/Interspeech.2018-1823

Cite as: Lam, M.W.Y., Hu, S., Xie, X., Liu, S., Yu, J., Su, R., Liu, X., Meng, H. (2018) Gaussian Process Neural Networks for Speech Recognition. Proc. Interspeech 2018, 1778-1782, DOI: 10.21437/Interspeech.2018-1823.


@inproceedings{Lam2018,
  author={Max W. Y. Lam and Shoukang Hu and Xurong Xie and Shansong Liu and Jianwei Yu and Rongfeng Su and Xunying Liu and Helen Meng},
  title={Gaussian Process Neural Networks for Speech Recognition},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={1778--1782},
  doi={10.21437/Interspeech.2018-1823},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1823}
}