Nowadays, almost all speaker-independent (SI) speech recognition systems use CDHMM with multivariate mixture Gaussian as observation density to cover speaker variabilities. It has been shown that given sufficient training data, the more mixtures are used in the HMM observation density, the better the systems perform. However, acoustic HMM with more Gaussian densities is more complex and slows down recognition speed. Another efficient way to handle speaker variation is to use speaker adaptation (SA). Yet, even though speaker adaptation of full multivariate mixture Gaussian densities can increase recognition accuracy, it does not improve recognition speed. In this paper, we introduce a principal mixture speaker adaptation method which reduces HMM complexity by choosing only the principle mixtures corresponding to a particular speakers characteristics. We show that our method both improves recognition accuracy by 31.8% when compared to SI models, and reduces recognition speed by 30%, when compared to full mixture SA models.
Cite as: Ye, H., Fung, P., Huang, T. (2000) Principal mixture speaker adaptation for improved continuous speech recognition. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 1, 774-777, doi: 10.21437/ICSLP.2000-192
@inproceedings{ye00_icslp, author={Hui Ye and Pascale Fung and Taiyi Huang}, title={{Principal mixture speaker adaptation for improved continuous speech recognition}}, year=2000, booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)}, pages={vol. 1, 774-777}, doi={10.21437/ICSLP.2000-192} }