5th International Conference on Spoken Language Processing
In this paper speaker clustering schemes are investigated in the context of improving unsupervised adaptation for broadcast news transcription. The various techniques are presented within a framework of top-down split-and-merge clustering. Since these schemes are to be used for MLLR-based adaptation, a natural evaluation metric for clustering is the increase in data likelihood from adaptation. Two types of cluster splitting criteria have been used. The first minimises a covariance-based distance measure and for the second we introduce a two-step E-M type procedure to form clusters which directly maximise the likelihood of the adapted data. It is shown that the direct maximisation technique produces a higher data likelihood and also gives a reduction in word error rate.
Bibliographic reference. Johnson, Sue E. / Woodland, Philip C. (1998): "Speaker clustering using direct maximisation of the MLLR-adapted likelihood", In ICSLP-1998, paper 0726.