15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Speaker Adaptation of Context Dependent Deep Neural Networks Based on MAP-Adaptation and GMM-Derived Feature Processing

Natalia Tomashenko, Yuri Khokhlov

Speech Technology Center, Russia

In this paper we propose a novel speaker adaptation method for a context-dependent deep neural network HMM (CD-DNN-HMM) acoustic model. The approach is based on using GMM-derived features as the input to the DNN. The described technique of processing features for DNNs makes it possible to use GMM-HMM adaptation algorithms in the neural network framework. Adaptation to a new speaker can be simply performed by adapting an auxiliary GMM-HMM model used in calculation of GMM-derived features and can be regarded as adaptation in the feature space for a DNN system. In this work, traditional maximum a posteriori adaptation is performed for an auxiliary GMM-HMM model. Experiments show that the proposed adaptation technique can provide, on average, a 5%–36% relative word error reduction on different adaptation sets under supervised adaptation setup, compared to speaker independent (SI) CD-DNN-HMM systems. In addition, several multi-stream combination techniques are examined in order to improve the performance of the baseline SI model.

Full Paper

Bibliographic reference.  Tomashenko, Natalia / Khokhlov, Yuri (2014): "Speaker adaptation of context dependent deep neural networks based on MAP-adaptation and GMM-derived feature processing", In INTERSPEECH-2014, 2997-3001.