ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

A scalable approach to using DNN-derived features in GMM-HMM based acoustic modeling for LVCSR

Zhi-Jie Yan, Qiang Huo, Jian Xu

We present a new scalable approach to using deep neural network (DNN) derived features in Gaussian mixture density hidden Markov model (GMM-HMM) based acoustic modeling for large vocabulary continuous speech recognition (LVCSR). The DNN-based feature extractor is trained from a subset of training data to mitigate the scalability issue of DNN training, while GMM-HMMs are trained by using state-of-the-art scalable training methods and tools to leverage the whole training set. In a benchmark evaluation, we used 309-hour Switchboard-I (SWB) training data to train a DNN first, which achieves a word error rate (WER) of 15.4% on NIST-2000 Hub5 evaluation set by a traditional DNN-HMM based approach. When the same DNN is used as a feature extractor and 2,000-hour "SWB+Fisher" training data is used to train the GMM-HMMs, our DNN-GMM-HMM approach achieves a WER of 13.8%. If per-conversation-side based unsupervised adaptation is performed, a WER of 13.1% can be achieved.


doi: 10.21437/Interspeech.2013-47

Cite as: Yan, Z.-J., Huo, Q., Xu, J. (2013) A scalable approach to using DNN-derived features in GMM-HMM based acoustic modeling for LVCSR. Proc. Interspeech 2013, 104-108, doi: 10.21437/Interspeech.2013-47

@inproceedings{yan13_interspeech,
  author={Zhi-Jie Yan and Qiang Huo and Jian Xu},
  title={{A scalable approach to using DNN-derived features in GMM-HMM based acoustic modeling for LVCSR}},
  year=2013,
  booktitle={Proc. Interspeech 2013},
  pages={104--108},
  doi={10.21437/Interspeech.2013-47}
}