7th International Conference on Spoken Language Processing

September 16-20, 2002
Denver, Colorado, USA

Speaker Utterances Tying Among Speaker Segmented Audio Documents Using Hierarchical Classification: Towards Speaker Indexing of Audio Databases

Sylvain Meignier (1), Jean-François Bonastre (1), Ivan Magrin-Chagnolleau (2)

(1) Université d’Avignon, France; (2) Université Lumière Lyon (2), France

Speaker indexing of an audio database consists in organizing the audio data according to the speakers present in the database. It is composed of three steps: (1) segmentation by speakers of each audio document; (2) speaker tying among the various segmented portions of the audio documents; and (3) generation of a speaker-based index. This paper focuses on the second step, the speaker tying task, which has not been addressed in the literature. The result of this task is a classification of the segmented acoustic data by clusters; each cluster should represent one speaker. This paper investigates on hierarchical classification approaches for speaker tying. Two new discriminant dissimilarity measures and a new bottom-up algorithm are also proposed. The experiments are conducted on a subset of the Switchboard database, a conversational telephone database, and show that the proposed method allows a very satisfying speaker tying among various audio documents, with a good level of purity for the clusters, but with a number of clusters significantly higher than the number of speakers.

Full Paper

Bibliographic reference.  Meignier, Sylvain / Bonastre, Jean-François / Magrin-Chagnolleau, Ivan (2002): "Speaker utterances tying among speaker segmented audio documents using hierarchical classification: towards speaker indexing of audio databases", In ICSLP-2002, 577-580.