Incorporating a Generative Front-End Layer to Deep Neural Network for Noise Robust Automatic Speech Recognition

Souvik Kundu, Khe Chai Sim, Mark J.F. Gales


It is difficult to apply well-formulated model-based noise adaptation approaches to Deep Neural Network (DNN) due to the lack of interpretability of the model parameters. In this paper, we propose incorporating a generative front-end layer (GFL), which is parameterised by Gaussian Mixture Model (GMM), into the DNN. A GFL can be easily adapted to different noise conditions by applying the model-based Vector Taylor Series (VTS) to the underlying GMM. We show that incorporating a GFL to DNN yields 12.1% relative improvement over a baseline multi-condition DNN. We also show that the proposed system performs significantly better than the noise aware training method, where the per-utterance estimated noise parameters are appended to the acoustic features.


DOI: 10.21437/Interspeech.2016-760

Cite as

Kundu, S., Sim, K.C., Gales, M.J. (2016) Incorporating a Generative Front-End Layer to Deep Neural Network for Noise Robust Automatic Speech Recognition. Proc. Interspeech 2016, 2359-2363.

Bibtex
@inproceedings{Kundu+2016,
author={Souvik Kundu and Khe Chai Sim and Mark J.F. Gales},
title={Incorporating a Generative Front-End Layer to Deep Neural Network for Noise Robust Automatic Speech Recognition},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-760},
url={http://dx.doi.org/10.21437/Interspeech.2016-760},
pages={2359--2363}
}