ISCA Archive ICSLP 2000
ISCA Archive ICSLP 2000

An online incremental speaker adaptation method using speaker-clustered initial models

Zhipeng Zhang, Sadaoki Furui

We previously proposed an incremental speaker adaptation method combined with automatic speaker-change detection for broadcast news transcription where speakers change frequently and each of them utters a series of several sentences. In this method, the speaker change is detected using speakerindependent and speaker-adaptive Gaussian mixture models (GMMs). Both phone HMMs and GMMs are incrementally adapted to each speaker by the combination of MLLR, MAP and VFS methods using speaker-independent (SI) models as initial models. This paper proposes its improvement in which an initial model for speaker adaptation is selected from a set of models made by speaker clustering. Either cluster-dependent phone HMMs or GMMs are used to calculate the likelihood for selecting the best initial model. In a broadcast news transcription task, the proposed method significantly reduces word error rate compared with the method using SI-HMM as an initial model. Online incremental speaker adaptation results show that word error rate is reduced by 11.6% relative to the baseline system with no speaker adaptation. The method using GMMs for cluster selection requires a significantly less number of computations than that using HMMs.


Cite as: Zhang, Z., Furui, S. (2000) An online incremental speaker adaptation method using speaker-clustered initial models. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 3, 694-697

@inproceedings{zhang00h_icslp,
  author={Zhipeng Zhang and Sadaoki Furui},
  title={{An online incremental speaker adaptation method using speaker-clustered initial models}},
  year=2000,
  booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)},
  pages={vol. 3, 694-697}
}