Speaker variability is one of the major error sources for ASR systems. Speaker adaptation estimates speaker specific models from the speaker independent ones to minimize the mismatch between the training and testing conditions arisen from speaker variabilities. One of the commonly adopted approaches is the transformation based method. In this paper, the discriminative input and output transforms for speaker adaptation in the hybrid NN/HMM systems are compared and further investigated with both structural and data-driven constraints. Experimental results show that the data-driven constrained discriminative transforms are much more robust for unsupervised adaptation.
Bibliographic reference. Li, Bo / Sim, Khe Chai (2010): "Comparison of discriminative input and output transformations for speaker adaptation in the hybrid NN/HMM systems", In INTERSPEECH-2010, 526-529.