Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

An Online Incremental Speaker Adaptation Method Using Speaker-Clustered Initial Models

Zhipeng Zhang, Sadaoki Furui

Tokyo Institute of Technology, Department of Computer Science, Meguro-ku, Tokyo, Japan

We previously proposed an incremental speaker adaptation method combined with automatic speaker-change detection for broadcast news transcription where speakers change frequently and each of them utters a series of several sentences. In this method, the speaker change is detected using speakerindependent and speaker-adaptive Gaussian mixture models (GMMs). Both phone HMMs and GMMs are incrementally adapted to each speaker by the combination of MLLR, MAP and VFS methods using speaker-independent (SI) models as initial models. This paper proposes its improvement in which an initial model for speaker adaptation is selected from a set of models made by speaker clustering. Either cluster-dependent phone HMMs or GMMs are used to calculate the likelihood for selecting the best initial model. In a broadcast news transcription task, the proposed method significantly reduces word error rate compared with the method using SI-HMM as an initial model. Online incremental speaker adaptation results show that word error rate is reduced by 11.6% relative to the baseline system with no speaker adaptation. The method using GMMs for cluster selection requires a significantly less number of computations than that using HMMs.


Full Paper

Bibliographic reference.  Zhang, Zhipeng / Furui, Sadaoki (2000): "An online incremental speaker adaptation method using speaker-clustered initial models", In ICSLP-2000, vol.3, 694-697.