EUROSPEECH 2003 - INTERSPEECH 2003
This paper addresses unsupervised speaker indexing for discussion audio archives. We propose a flexible framework that selects an optimal speaker model (GMM or VQ) based on the Bayesian Information Criterion (BIC) according to input utterances. The framework makes it possible to use a discrete model when the data is sparse, and to seamlessly switch to a continuous model after a large cluster is obtained. The speaker indexing is also applied and evaluated at automatic speech recognition of discussions by adapting a speaker-independent acoustic model to each participant. It is demonstrated that indexing with our method is sufficiently accurate for the speaker adaptation.
Bibliographic reference. Nishida, Masafumi / Kawahara, Tatsuya (2003): "Speaker model selection using Bayesian information criterion for speaker indexing and speaker adaptation", In EUROSPEECH-2003, 1849-1852.