16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

I-Vector Dependent Feature Space Transformations for Adaptive Speech Recognition

Xiangang Li, Xihong Wu

Peking University, China

In this paper, we propose a new feature normalization approach for deep neural networks (DNNs) based adaptive speech recognition. Each speaker is represented by an i-vector, and the i-vector dependent block-diagonal transformation matrix is obtained by a tensor and performed on the input features. The parameters of tensor are shared by all the frames in the input window, and factorized into three matrices. The proposed approach is more practical for real-application speech recognition tasks since it eliminates the time-consuming adaptive training process to estimate the transformation matrix in feature-space discriminative linear regression (fDLR). We empirically evaluated the proposed approach on a conversational telephone speech recognition task. Experimental results show that the proposed approach can yield 7% relative improvement for the long short-term memory network based speech recognition system.

Full Paper

Bibliographic reference.  Li, Xiangang / Wu, Xihong (2015): "I-vector dependent feature space transformations for adaptive speech recognition", In INTERSPEECH-2015, 3635-3639.