8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


An Improved Model-Based Speaker Segmentation System

Peng Yu, Frank Seide, Chengyuan Ma, Eric Chang

Microsoft Research Asia, China

In this paper, we report our recent work on speaker segmentation. Without a priori information about speaker number and speaker identities, the audio stream is segmented, and segments of the same speaker are grouped together. Speakers are represented by Gaussian Mixture Models (GMMs), then an HMM network is used for segmentation. However, unlike other model-based segmentation methods, the speaker GMMs are initialized using a simpler distance based segmentation algorithm. To group segments of identical speakers, a two-level clustering mechanism is introduced, which we found to achieve higher accuracy than direct distance based clustering methods. Our method significantly outperforms the best result reported at the 2002 Speaker Recognition Workshop. When tested on a professionally produced TV program set, our system reports only 3.5% frame errors.

Full Paper

Bibliographic reference.  Yu, Peng / Seide, Frank / Ma, Chengyuan / Chang, Eric (2003): "An improved model-based speaker segmentation system", In EUROSPEECH-2003, 2025-2028.