Accessing Information in Spoken Audio

April 19-20, 1999
Cambridge, UK

Speaker-based segmentation for audio data indexing

Perrine Delacourt, David Kryze, and Christian J. Wellekens

EURECOM, Sophia Antipolis, France

In this paper, we address the problem of the speaker-based segmentation, which is the first necessary step for several indexing tasks. It consists in recognizing from their voice the sequence of people engaged in a conversation. In our context, we make no assumptions about prior knowledge of the speaker characteristics (no speaker model, no speech model, no training phase). However, we assume that people do not speak simultaneously. Our segmentation technique takes advantages of two different types of segmentation algorithms. It is organized in two passes: first, the most likely speaker changing points are detected and then, they are validated or discarded. Our algorithm is efficient to detect speaker changing points even close to one another and is thus suited for segmenting conversations containing segments of any length.

