7th International Conference on Spoken Language Processing
September 16-20, 2002
Speaker indexing of an audio database consists in organizing the audio data according to the speakers present in the database. It is composed of three steps: (1) segmentation by speakers of each audio document; (2) speaker tying among the various segmented portions of the audio documents; and (3) generation of a speaker-based index. This paper focuses on the second step, the speaker tying task, which has not been addressed in the literature. The result of this task is a classification of the segmented acoustic data by clusters; each cluster should represent one speaker. This paper investigates on hierarchical classification approaches for speaker tying. Two new discriminant dissimilarity measures and a new bottom-up algorithm are also proposed. The experiments are conducted on a subset of the Switchboard database, a conversational telephone database, and show that the proposed method allows a very satisfying speaker tying among various audio documents, with a good level of purity for the clusters, but with a number of clusters significantly higher than the number of speakers.
Bibliographic reference. Meignier, Sylvain / Bonastre, Jean-François / Magrin-Chagnolleau, Ivan (2002): "Speaker utterances tying among speaker segmented audio documents using hierarchical classification: towards speaker indexing of audio databases", In ICSLP-2002, 577-580.