ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Speaker adaptation using the i-vector technique for bottleneck features

Patrick Cardinal, Najim Dehak, Yu Zhang, James Glass

Deep Neural Networks (DNN) have been largely used and successfully applied in the context of speaker independent Automatic Speech Recognition (ASR). However, these models are not easily adapted to model a specific speaker characteristic. Recently, one approach was proposed to address this issue, which consists of using the I-vector representation as input to the DNN. The I-vector is playing the role of providing information about the speaker as well as the environmental conditions for a given recording. This approach achieved a significant improvement in the context of a hybrid system of DNN combined with Hidden Markov Model (HMM). In this paper, we study the effect of speaker adaptation based on the I-vector framework in the context of stacked bottleneck features. These features, extracted from a second level of DNNs, are modelled by a classical Gaussian Mixture Model (GMM) ASR system. The proposed approach achieved an absolute WER improvement of 1.2% on an Arabic Broadcast news task.

doi: 10.21437/Interspeech.2015-603

Cite as: Cardinal, P., Dehak, N., Zhang, Y., Glass, J. (2015) Speaker adaptation using the i-vector technique for bottleneck features. Proc. Interspeech 2015, 2867-2871, doi: 10.21437/Interspeech.2015-603

  author={Patrick Cardinal and Najim Dehak and Yu Zhang and James Glass},
  title={{Speaker adaptation using the i-vector technique for bottleneck features}},
  booktitle={Proc. Interspeech 2015},