Recently, we have proposed a novel fast adaptation method for the hybrid DNN-HMM models in speech recognition. This method relies on learning an adaptation NN that is capable of transforming input speech features for a certain speaker into a more speaker independent space given a suitable speaker code. Speaker codes are learned for each speaker during adaptation. The whole multispeaker training dataset is used to learn the adaptation NN weights. Our previous work has shown that this method is quite effective in adapting DNNs even when only a very small amount of adaptation data is available. However, the proposed method does not work well in the case of convolutional neural network (CNN). In this paper, we investigate the fast adaptation of CNN models. We first modify the speaker code based adaptation method to better suit to the CNN structure. Moreover, we investigate a new adaptation scheme using speaker specific adaptive nodes output weights. These weights scale different nodes outputs to optimize the model for new speakers. Experimental results on the TIMIT dataset demonstrates that both methods are quite effective in terms of adapting CNN based acoustic models and we can achieve even better performance by combining these two methods together.
Bibliographic reference. Abdel-Hamid, Ossama / Jiang, Hui (2013): "Rapid and effective speaker adaptation of convolutional neural network based models for speech recognition", In INTERSPEECH-2013, 1248-1252.