Sixth International Conference on Spoken Language Processing
We previously proposed an incremental speaker adaptation method combined with automatic speaker-change detection for broadcast news transcription where speakers change frequently and each of them utters a series of several sentences. In this method, the speaker change is detected using speakerindependent and speaker-adaptive Gaussian mixture models (GMMs). Both phone HMMs and GMMs are incrementally adapted to each speaker by the combination of MLLR, MAP and VFS methods using speaker-independent (SI) models as initial models. This paper proposes its improvement in which an initial model for speaker adaptation is selected from a set of models made by speaker clustering. Either cluster-dependent phone HMMs or GMMs are used to calculate the likelihood for selecting the best initial model. In a broadcast news transcription task, the proposed method significantly reduces word error rate compared with the method using SI-HMM as an initial model. Online incremental speaker adaptation results show that word error rate is reduced by 11.6% relative to the baseline system with no speaker adaptation. The method using GMMs for cluster selection requires a significantly less number of computations than that using HMMs.
Bibliographic reference. Zhang, Zhipeng / Furui, Sadaoki (2000): "An online incremental speaker adaptation method using speaker-clustered initial models", In ICSLP-2000, vol.3, 694-697.