The use of Deep Belief Networks (DBN) to pretrain Neural Networks has recently led to a resurgence in the use of Artificial Neural Network - Hidden Markov Model (ANN/HMM) hybrid systems for Automatic Speech Recognition (ASR). In this paper we report results of a DBN-pretrained contextdependent ANN/HMM system trained on two datasets that are much larger than any reported previously with DBN-pretrained ANN/HMM systems - 5870 hours of Voice Search and 1400 hours of YouTube data. On the first dataset, the pretrained ANN/HMM system outperforms the best Gaussian Mixture Model - Hidden Markov Model (GMM/HMM) baseline, built with a much larger dataset by 3.7% absolute WER, while on the second dataset, it outperforms the GMM/HMM baseline by 2.9% absolute. Maximum Mutual Information (MMI) fine tuning and model combination using Segmental Conditional Random Fields (SCARF) give additional gains of 0.1% and 0.4% on the first dataset and 0.6% and 1.1% absolute on the second dataset.
Index Terms: Deep Belief Networks, Acoustic Modeling, Artificial Neural Network, ANN/HMM
Bibliographic reference. Jaitly, Navdeep / Nguyen, Patrick / Senior, Andrew / Vanhoucke, Vincent (2012): "Application of pretrained deep neural networks to large vocabulary speech recognition", In INTERSPEECH-2012, 2578-2581.