2001: A Speaker Odyssey - The Speaker Recognition Workshop
June 18-22, 2001
In unlabeled and unsegmented conversation, i.e. no a-priori knowledge about speakers' identity and segments boundaries is provided, it is very important to cluster the conversation (make segmentation and labeling) with the best possible resolution. In this work the performance of a system, which employs different segment lengths, is presented. We assumed that the number of speakers is known, and high-quality conversations were used. Each speaker was modeled by a Self-Organizing-Map (SOM). An iterative algorithm allows the data to move from one model to another and adjust the SOMs. The restriction that the data can move only in small groups but not by moving each and every feature vector separately force the SOMs to adjust to speakers (instead of phonemes or other vocal events). We found that the optimal segment duration was half-second. The system has a clustering performance of about 90% for tow-speaker conversation and over 80% for three-speaker conversations.
Full Paper Presentation
Bibliographic reference. Lapidot, Itshak / Guterman, Hugo (2001): "Resolution limitation in speakers clustering and segmentation problems", In ODYSSEY-2001, 169-174.