Speaker adaptive training (SAT) is a useful technique for building speech recognition systems on non-homogeneous data. When combining SAT with discriminative training criteria, maximum likelihood (ML) transforms are often used for unsupervised adaptation tasks. This is because discriminatively estimated transforms are highly sensitive to errors in the supervision hypothesis. In this paper, speaker adaptive training based on discriminative mapping transforms (DMTs) is proposed. DMTs are speaker-independent discriminative transforms that are applied to ML-estimated speaker-specific transforms. As DMTs are estimated during training, they are not affected by errors in the supervision hypothesis. The proposed method was evaluated on an English conversational telephone speech task. It was found to significantly outperform the standard discriminative SAT schemes.
Bibliographic reference. Raut, C. K. / Yu, K. / Gales, M. J. F. (2008): "Adaptive training using discriminative mapping transforms", In INTERSPEECH-2008, 1697-1700.