On the Use of Gaussian Mixture Model Framework to Improve Speaker Adaptation of Deep Neural Network Acoustic Models

Natalia Tomashenko, Yuri Khokhlov, Yannick Estève


In this paper we investigate the Gaussian Mixture Model (GMM) framework for adaptation of context-dependent deep neural network HMM (CD-DNN-HMM) acoustic models. In the previous work an initial attempt was introduced for efficient transfer of adaptation algorithms from the GMM framework to DNN models. In this work we present an extension, further detailed exploration and analysis of the method with respect to state-of-the-art speech recognition DNN setup and propose various novel ways for adaptation performance improvement, such as, using bottleneck features for GMM-derived feature extraction, combination of GMM-derived with conventional features at different levels of DNN architecture, moving from monophones to triphones in the auxiliary GMM model in order to extend the number of adapted classes, and finally, using lattice-based information and confidence scores in maximum a posteriori adaptation of the auxiliary GMM model. Experimental results on the TED-LIUM corpus show that the proposed adaptation technique can be effectively integrated into DNN setup at different levels and provide additional gain in recognition performance.


DOI: 10.21437/Interspeech.2016-1230

Cite as

Tomashenko, N., Khokhlov, Y., Estève, Y. (2016) On the Use of Gaussian Mixture Model Framework to Improve Speaker Adaptation of Deep Neural Network Acoustic Models. Proc. Interspeech 2016, 3788-3792.

Bibtex
@inproceedings{Tomashenko+2016,
author={Natalia Tomashenko and Yuri Khokhlov and Yannick Estève},
title={On the Use of Gaussian Mixture Model Framework to Improve Speaker Adaptation of Deep Neural Network Acoustic Models},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-1230},
url={http://dx.doi.org/10.21437/Interspeech.2016-1230},
pages={3788--3792}
}