In this paper, we propose a new feature normalization approach for deep neural networks (DNNs) based adaptive speech recognition. Each speaker is represented by an i-vector, and the i-vector dependent block-diagonal transformation matrix is obtained by a tensor and performed on the input features. The parameters of tensor are shared by all the frames in the input window, and factorized into three matrices. The proposed approach is more practical for real-application speech recognition tasks since it eliminates the time-consuming adaptive training process to estimate the transformation matrix in feature-space discriminative linear regression (fDLR). We empirically evaluated the proposed approach on a conversational telephone speech recognition task. Experimental results show that the proposed approach can yield 7% relative improvement for the long short-term memory network based speech recognition system.
Bibliographic reference. Li, Xiangang / Wu, Xihong (2015): "I-vector dependent feature space transformations for adaptive speech recognition", In INTERSPEECH-2015, 3635-3639.