INTERSPEECH 2014

In this paper, we propose variablecomponent DNN (VCDNN) to improve the robustness of contextdependent deep neural network hidden Markov model (CDDNNHMM). This method is inspired by the idea from variableparameter HMM (VPHMM) in which the variation of model parameters are modeled as a set of polynomial functions of environmental signaltonoise ratio (SNR), and during the testing, the model parameters are recomputed according to the estimated testing SNR. In VCDNN, we refine two types of DNN components: (1) weighting matrix and bias (2) the output of each layer. Experimental results on Aurora4 task show VCDNN achieved 6.53% and 5.92% relative word error rate reduction (WERR) over the standard DNN for the two methods, respectively. Under unseen SNR conditions, VCDNN gave even better result (8.46% relative WERR for the DNN varying matrix and bias, 7.08% relative WERR for the DNN varying layer output). Moreover, VCDNN with 1024 units per hidden layer beats the standard DNN with 2048 units per hidden layer with 3.22% WERR and a half computational/memory cost reduction, showing superior ability to produce sharper and more compact models.
Bibliographic reference. Zhao, Rui / Li, Jinyu / Gong, Yifan (2014): "Variablecomponent deep neural network for robust speech recognition", In INTERSPEECH2014, 27192723.