Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Speaker Normalization Training and Adaptation for Speech Recognition

Lei He, Ditang Fang, Wenhu Wu

Center of Speech Technology, State Key Laboratory of Intelligent Technology and Systems, Department of Computer Science & Technology, Tsinghua University, Beijing, China

This paper presents a speaker adaptation framework that combines the speaker normalization (SN) training. Because of the varieties among training speakers, more data are required in training and adaptation of speaker independent (SI) acoustic model. In this paper, a very simple but effective normalization method is presented, in which the distortions among different speakers are removed by subtracting the state-relative shift vectors between SI model and speaker dependent (SD) model. In adaptation stage, MAP estimation is used to update the models with adaptation data, and the interpolation of unseen models and smoothing of the final models are implemented by orderalterable weighted neighbor regression (WNR) method. In Mandarin syllable recognition task, with equal adaptation data, SN model as seed model makes a 5%-15% additional reduction in error rate comparing with SI model as seed model.

Full Paper

Bibliographic reference.  He, Lei / Fang, Ditang / Wu, Wenhu (2000): "Speaker normalization training and adaptation for speech recognition", In ICSLP-2000, vol.4, 342-345.