ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Robust i-vector extraction for neural network adaptation in noisy environment

Chengzhu Yu, Atsunori Ogawa, Marc Delcroix, Takuya Yoshioka, Tomohiro Nakatani, John H. L. Hansen

In this study, we explore an i-vector based adaptation of deep neural network (DNN) in noisy environment. We first demonstrate the importance of encapsulating environment and channel variability into i-vectors for DNN adaptation in noisy conditions. To be able to obtain robust i-vector without losing noise and channel variability information, we investigate the use of parallel feature based i-vector extraction for DNN adaptation. Specifically, different types of features are used separately during two different stages of i-vector extraction namely universal background model (UBM) state alignment and i-vector computation. To capture noise and channel-specific feature variation, the conventional MFCC features are still used for i-vector computation. However, much more robust features such as Vector Taylor Series (VTS) enhanced as well as bottleneck features are exploited for UBM state alignment. Experimental results on Aurora-4 show that the parallel feature-based i-vectors yield performance gains of up to 9.2% relative compared to a baseline DNN-HMM system and 3.3% compared to a system using conventional MFCC-based i-vectors.

doi: 10.21437/Interspeech.2015-600

Cite as: Yu, C., Ogawa, A., Delcroix, M., Yoshioka, T., Nakatani, T., Hansen, J.H.L. (2015) Robust i-vector extraction for neural network adaptation in noisy environment. Proc. Interspeech 2015, 2854-2857, doi: 10.21437/Interspeech.2015-600

  author={Chengzhu Yu and Atsunori Ogawa and Marc Delcroix and Takuya Yoshioka and Tomohiro Nakatani and John H. L. Hansen},
  title={{Robust i-vector extraction for neural network adaptation in noisy environment}},
  booktitle={Proc. Interspeech 2015},