16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

GMM-Derived Features for Effective Unsupervised Adaptation of Deep Neural Network Acoustic Models

Natalia Tomashenko (1), Yuri Khokhlov (2)

(1) Speech Technology Center, Russia
(2) STC-innovations, Russia

In this paper we investigate GMM-derived features recently introduced for adaptation of context-dependent deep neural network HMM (CD-DNN-HMM) acoustic models. We improve the previously proposed adaptation algorithm by applying the concept of speaker adaptive training (SAT) to DNNs built on GMM-derived features and by using fMLLR-adapted features for training an auxiliary GMM model. Traditional adaptation algorithms, such as maximum a posteriori adaptation (MAP) and feature space maximum likelihood linear regression (fMLLR) are performed for the auxiliary GMM model used in a SAT procedure for a DNN. Experimental results on theWall Street Journal (WSJ0) corpus show that the proposed adaptation technique can provide, on average, a 17-28% relative word error rate (WER) reduction on different adaptation sets under an unsupervised adaptation setup, compared to speaker independent (SI) DNN-HMM systems built on conventional features. We found that fMLLR adaptation for the SAT DNN trained on GMM-derived features outperforms fMLLR adaptation for the SAT DNN trained on conventional features by up to 14% of relative WER reduction.

Full Paper

Bibliographic reference.  Tomashenko, Natalia / Khokhlov, Yuri (2015): "GMM-derived features for effective unsupervised adaptation of deep neural network acoustic models", In INTERSPEECH-2015, 2882-2886.