Deep Neural Networks (DNN) have been largely used and successfully applied in the context of speaker independent Automatic Speech Recognition (ASR). However, these models are not easily adapted to model a specific speaker characteristic. Recently, one approach was proposed to address this issue, which consists of using the I-vector representation as input to the DNN. The I-vector is playing the role of providing information about the speaker as well as the environmental conditions for a given recording. This approach achieved a significant improvement in the context of a hybrid system of DNN combined with Hidden Markov Model (HMM). In this paper, we study the effect of speaker adaptation based on the I-vector framework in the context of stacked bottleneck features. These features, extracted from a second level of DNNs, are modelled by a classical Gaussian Mixture Model (GMM) ASR system. The proposed approach achieved an absolute WER improvement of 1.2% on an Arabic Broadcast news task.
Cite as: Cardinal, P., Dehak, N., Zhang, Y., Glass, J. (2015) Speaker adaptation using the i-vector technique for bottleneck features. Proc. Interspeech 2015, 2867-2871, doi: 10.21437/Interspeech.2015-603
@inproceedings{cardinal15_interspeech, author={Patrick Cardinal and Najim Dehak and Yu Zhang and James Glass}, title={{Speaker adaptation using the i-vector technique for bottleneck features}}, year=2015, booktitle={Proc. Interspeech 2015}, pages={2867--2871}, doi={10.21437/Interspeech.2015-603} }