INTERSPEECH 2004 - ICSLP
This paper presents a 3-stage adaptation framework based on speaker selection training. First a subset of cohort speakers is selected for test speaker using Gaussian mixture model, which is more reliable given very limited adaptation data. Then cohort models are linearly transformed closer to each test speaker. Finally the adapted model for the test speaker is obtained by combining these transformed models. Combination weights as well as bias items are adaptively learned from adaptation data. Experiments showed that model transformation before combination would improve the robustness of the scheme. With only 30s of adaptation data, about 14.9% relative error rate reduction is achieved on a large vocabulary continuous speech recognition task.
Bibliographic reference. Huang, Chao / Chen, Tao / Chang, Eric (2004): "Transformation and combination of hiden Markov models for speaker selection training", In INTERSPEECH-2004, 9-12.