16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

Speaker Adaptation Using the I-Vector Technique for Bottleneck Features

Patrick Cardinal, Najim Dehak, Yu Zhang, James Glass


Deep Neural Networks (DNN) have been largely used and successfully applied in the context of speaker independent Automatic Speech Recognition (ASR). However, these models are not easily adapted to model a specific speaker characteristic. Recently, one approach was proposed to address this issue, which consists of using the I-vector representation as input to the DNN. The I-vector is playing the role of providing information about the speaker as well as the environmental conditions for a given recording. This approach achieved a significant improvement in the context of a hybrid system of DNN combined with Hidden Markov Model (HMM). In this paper, we study the effect of speaker adaptation based on the I-vector framework in the context of stacked bottleneck features. These features, extracted from a second level of DNNs, are modelled by a classical Gaussian Mixture Model (GMM) ASR system. The proposed approach achieved an absolute WER improvement of 1.2% on an Arabic Broadcast news task.

Full Paper

Bibliographic reference.  Cardinal, Patrick / Dehak, Najim / Zhang, Yu / Glass, James (2015): "Speaker adaptation using the i-vector technique for bottleneck features", In INTERSPEECH-2015, 2867-2871.