Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Principal Mixture Speaker Adaptation for Improved Continuous Speech Recognition

Hui Ye (1), Pascale Fung (1), Taiyi Huang (2)

(1) Human Language Technology Center (HLTC), Department of Electrical and Electronic Engineering, Hong Kong University of Science and Technology (HKUST), Hong Kong
(2)National Lab of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences, Beijing, China

Nowadays, almost all speaker-independent (SI) speech recognition systems use CDHMM with multivariate mixture Gaussian as observation density to cover speaker variabilities. It has been shown that given sufficient training data, the more mixtures are used in the HMM observation density, the better the systemís perform. However, acoustic HMM with more Gaussian densities is more complex and slows down recognition speed. Another efficient way to handle speaker variation is to use speaker adaptation (SA). Yet, even though speaker adaptation of full multivariate mixture Gaussian densities can increase recognition accuracy, it does not improve recognition speed. In this paper, we introduce a principal mixture speaker adaptation method which reduces HMM complexity by choosing only the principle mixtures corresponding to a particular speakerís characteristics. We show that our method both improves recognition accuracy by 31.8% when compared to SI models, and reduces recognition speed by 30%, when compared to full mixture SA models.


Full Paper

Bibliographic reference.  Ye, Hui / Fung, Pascale / Huang, Taiyi (2000): "Principal mixture speaker adaptation for improved continuous speech recognition", In ICSLP-2000, vol.1, 774-777.