16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

Robust i-Vector Extraction for Neural Network Adaptation in Noisy Environment

Chengzhu Yu (1), Atsunori Ogawa (2), Marc Delcroix (2), Takuya Yoshioka (2), Tomohiro Nakatani (2), John H. L. Hansen (1)

(1) University of Texas at Dallas, USA
(2) NTT Corporation, Japan

In this study, we explore an i-vector based adaptation of deep neural network (DNN) in noisy environment. We first demonstrate the importance of encapsulating environment and channel variability into i-vectors for DNN adaptation in noisy conditions. To be able to obtain robust i-vector without losing noise and channel variability information, we investigate the use of parallel feature based i-vector extraction for DNN adaptation. Specifically, different types of features are used separately during two different stages of i-vector extraction namely
    universal background model (UBM) state alignment and i-vector computation. To capture noise and channel-specific feature variation, the conventional MFCC features are still used for i-vector computation. However, much more robust features such as Vector Taylor Series (VTS) enhanced as well as bottleneck features are exploited for UBM state alignment. Experimental results on Aurora-4 show that the parallel feature-based i-vectors yield performance gains of up to 9.2% relative compared to a baseline DNN-HMM system and 3.3% compared to a system using conventional MFCC-based i-vectors.

Full Paper

Bibliographic reference.  Yu, Chengzhu / Ogawa, Atsunori / Delcroix, Marc / Yoshioka, Takuya / Nakatani, Tomohiro / Hansen, John H. L. (2015): "Robust i-vector extraction for neural network adaptation in noisy environment", In INTERSPEECH-2015, 2854-2857.