11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Comparison of Discriminative Input and Output Transformations for Speaker Adaptation in the Hybrid NN/HMM Systems

Bo Li, Khe Chai Sim

National University of Singapore, Singapore

Speaker variability is one of the major error sources for ASR systems. Speaker adaptation estimates speaker specific models from the speaker independent ones to minimize the mismatch between the training and testing conditions arisen from speaker variabilities. One of the commonly adopted approaches is the transformation based method. In this paper, the discriminative input and output transforms for speaker adaptation in the hybrid NN/HMM systems are compared and further investigated with both structural and data-driven constraints. Experimental results show that the data-driven constrained discriminative transforms are much more robust for unsupervised adaptation.

Full Paper

Bibliographic reference.  Li, Bo / Sim, Khe Chai (2010): "Comparison of discriminative input and output transformations for speaker adaptation in the hybrid NN/HMM systems", In INTERSPEECH-2010, 526-529.