ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Modular combination of deep neural networks for acoustic modeling

Jonas Gehring, Wonkyum Lee, Kevin Kilgour, Ian Lane, Yajie Miao, Alex Waibel

In this work, we propose a modular combination of two popular applications of neural networks to large-vocabulary continuous speech recognition. First, a deep neural network is trained to extract bottleneck features from frames of mel scale filterbank coefficients. In a similar way as is usually done for GMM/HMM systems, this network is then applied as a non-linear discriminative feature-space transformation for a hybrid setup where acoustic modeling is performed by a deep belief network. This effectively results in a very large network, where the layers of the bottleneck network are fixed and applied to successive windows of feature frames in a time-delay fashion. We show that bottleneck features improve the recognition performance of DBN/HMM hybrids, and that the modular combination enables the acoustic model to benefit from a larger temporal context. Our architecture is evaluated on a recently released and challenging Tagalog corpus containing conversational telephone speech.


doi: 10.21437/Interspeech.2013-45

Cite as: Gehring, J., Lee, W., Kilgour, K., Lane, I., Miao, Y., Waibel, A. (2013) Modular combination of deep neural networks for acoustic modeling. Proc. Interspeech 2013, 94-98, doi: 10.21437/Interspeech.2013-45

@inproceedings{gehring13_interspeech,
  author={Jonas Gehring and Wonkyum Lee and Kevin Kilgour and Ian Lane and Yajie Miao and Alex Waibel},
  title={{Modular combination of deep neural networks for acoustic modeling}},
  year=2013,
  booktitle={Proc. Interspeech 2013},
  pages={94--98},
  doi={10.21437/Interspeech.2013-45}
}