EUROSPEECH 2003 - INTERSPEECH 2003
8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003

        

A Fast, Accurate and Stream-Based Speaker Segmentation and Clustering Algorithm

An Vandecatseye, Jean-Pierre Martens

Ghent University, Belgium

In this paper a new pre-processor for a free speech transcription system is described. It performs a speech/non-speech partition, a segmentation of the speech parts into speaker turns, and a clustering of the speaker turns. It works in a stream-based mode, and it is aiming for a high accuracy with a low delay and processing time. Experiments on the Hub4 Broadcast News corpus show that the newly proposed pre-processor is competitive with and in some respects better than the best systems published so far. The paper also describes attempts to raise the system performance by supplementing the standard MFCC features with prosodic features such as pitch and voicing evidence.

Full Paper

Bibliographic reference.  Vandecatseye, An / Martens, Jean-Pierre (2003): "A fast, accurate and stream-based speaker segmentation and clustering algorithm", In EUROSPEECH-2003, 941-944.