We previously proposed an incremental speaker adaptation method combined with automatic speaker-change detection for broadcast news transcription where speakers change frequently and each of them utters a series of several sentences. In this method, the speaker change is detected using speakerindependent and speaker-adaptive Gaussian mixture models (GMMs). Both phone HMMs and GMMs are incrementally adapted to each speaker by the combination of MLLR, MAP and VFS methods using speaker-independent (SI) models as initial models. This paper proposes its improvement in which an initial model for speaker adaptation is selected from a set of models made by speaker clustering. Either cluster-dependent phone HMMs or GMMs are used to calculate the likelihood for selecting the best initial model. In a broadcast news transcription task, the proposed method significantly reduces word error rate compared with the method using SI-HMM as an initial model. Online incremental speaker adaptation results show that word error rate is reduced by 11.6% relative to the baseline system with no speaker adaptation. The method using GMMs for cluster selection requires a significantly less number of computations than that using HMMs.
Cite as: Zhang, Z., Furui, S. (2000) An online incremental speaker adaptation method using speaker-clustered initial models. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 3, 694-697
@inproceedings{zhang00h_icslp, author={Zhipeng Zhang and Sadaoki Furui}, title={{An online incremental speaker adaptation method using speaker-clustered initial models}}, year=2000, booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)}, pages={vol. 3, 694-697} }