Improved Single System Conversational Telephone Speech Recognition with VGG Bottleneck Features

William Hartmann, Roger Hsiao, Tim Ng, Jeff Ma, Francis Keith, Man-Hung Siu


On small datasets, discriminatively trained bottleneck features from deep networks commonly outperform more traditional spectral or cepstral features. While these features are typically trained with small, fully-connected networks, recent studies have used more sophisticated networks with great success. We use the recent deep CNN (VGG) network for bottleneck feature extraction — previously used only for low-resource tasks — and apply it to the Switchboard English conversational telephone speech task. Unlike features derived from traditional MLP networks, the VGG features outperform cepstral features even when used with BLSTM acoustic models trained on large amounts of data. We achieve the best BBN single system performance when combining the VGG features with a BLSTM acoustic model. When decoding with an n-gram language model, which are used for deployable systems, we have a realistic production system with a WER of 7.4%. This result is competitive with the current state-of-the-art in the literature. While our focus is on realistic single system performance, we further reduce the WER to 6.1% through system combination and using expensive neural network language model rescoring.


 DOI: 10.21437/Interspeech.2017-1513

Cite as: Hartmann, W., Hsiao, R., Ng, T., Ma, J., Keith, F., Siu, M. (2017) Improved Single System Conversational Telephone Speech Recognition with VGG Bottleneck Features. Proc. Interspeech 2017, 112-116, DOI: 10.21437/Interspeech.2017-1513.


@inproceedings{Hartmann2017,
  author={William Hartmann and Roger Hsiao and Tim Ng and Jeff Ma and Francis Keith and Man-Hung Siu},
  title={Improved Single System Conversational Telephone Speech Recognition with VGG Bottleneck Features},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={112--116},
  doi={10.21437/Interspeech.2017-1513},
  url={http://dx.doi.org/10.21437/Interspeech.2017-1513}
}