7th International Conference on Spoken Language Processing
September 16-20, 2002
A process for segmenting 2-speaker telephone conversations by speaker with no prior speaker models is described and evaluated. The process consists of an initial segmentation using acoustic change and pause detection, segment clustering, and iterative modeling of segment clusters and resegmentation. The technique has been evaluated on (6), approximately 3 min long, customer care conversations. The technique does not resolve short (< 2 secs) or overlapping segments very well, but is capable of detecting longer segments (> 4 secs) with miss rates of the order of 10% and confusion rates 2% or less.
Bibliographic reference. Rosenberg, Aaron E. / Gorin, Allen / Liu, Zhu / Parthasarathy, S. (2002): "Unsupervised speaker segmentation of telephone conversations", In ICSLP-2002, 565-568.