Sixth International Conference on Spoken Language Processing
This paper presents a speaker adaptation framework that combines the speaker normalization (SN) training. Because of the varieties among training speakers, more data are required in training and adaptation of speaker independent (SI) acoustic model. In this paper, a very simple but effective normalization method is presented, in which the distortions among different speakers are removed by subtracting the state-relative shift vectors between SI model and speaker dependent (SD) model. In adaptation stage, MAP estimation is used to update the models with adaptation data, and the interpolation of unseen models and smoothing of the final models are implemented by orderalterable weighted neighbor regression (WNR) method. In Mandarin syllable recognition task, with equal adaptation data, SN model as seed model makes a 5%-15% additional reduction in error rate comparing with SI model as seed model.
Bibliographic reference. He, Lei / Fang, Ditang / Wu, Wenhu (2000): "Speaker normalization training and adaptation for speech recognition", In ICSLP-2000, vol.4, 342-345.