8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Transformation and Combination of Hiden Markov Models for Speaker Selection Training

Chao Huang (1), Tao Chen (2), Eric Chang (1)

(1) Microsoft Research Asia, Beijing, China
(2) University of Newcastle, UK

This paper presents a 3-stage adaptation framework based on speaker selection training. First a subset of cohort speakers is selected for test speaker using Gaussian mixture model, which is more reliable given very limited adaptation data. Then cohort models are linearly transformed closer to each test speaker. Finally the adapted model for the test speaker is obtained by combining these transformed models. Combination weights as well as bias items are adaptively learned from adaptation data. Experiments showed that model transformation before combination would improve the robustness of the scheme. With only 30s of adaptation data, about 14.9% relative error rate reduction is achieved on a large vocabulary continuous speech recognition task.

Full Paper

Bibliographic reference.  Huang, Chao / Chen, Tao / Chang, Eric (2004): "Transformation and combination of hiden Markov models for speaker selection training", In INTERSPEECH-2004, 9-12.