15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Variable-Component Deep Neural Network for Robust Speech Recognition

Rui Zhao (1), Jinyu Li (2), Yifan Gong (2)

(1) Microsoft, China
(2) Microsoft, USA

In this paper, we propose variable-component DNN (VCDNN) to improve the robustness of context-dependent deep neural network hidden Markov model (CD-DNN-HMM). This method is inspired by the idea from variable-parameter HMM (VPHMM) in which the variation of model parameters are modeled as a set of polynomial functions of environmental signal-to-noise ratio (SNR), and during the testing, the model parameters are recomputed according to the estimated testing SNR. In VCDNN, we refine two types of DNN components: (1) weighting matrix and bias (2) the output of each layer. Experimental results on Aurora4 task show VCDNN achieved 6.53% and 5.92% relative word error rate reduction (WERR) over the standard DNN for the two methods, respectively. Under unseen SNR conditions, VCDNN gave even better result (8.46% relative WERR for the DNN varying matrix and bias, 7.08% relative WERR for the DNN varying layer output). Moreover, VCDNN with 1024 units per hidden layer beats the standard DNN with 2048 units per hidden layer with 3.22% WERR and a half computational/memory cost reduction, showing superior ability to produce sharper and more compact models.

Full Paper

Bibliographic reference.  Zhao, Rui / Li, Jinyu / Gong, Yifan (2014): "Variable-component deep neural network for robust speech recognition", In INTERSPEECH-2014, 2719-2723.