8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

A Semi-Automatic Approach for Speaker Mining of Tapped Telephone Conversations

Sandeep Manocha, Carol Y. Espy-Wilson

University of Maryland, USA

Speaker mining involves speaker detection in a set of multi-speaker files. In previous work on speaker mining, training data is used for constructing target speaker models. In this study, a new speaker mining scenario was considered, where there is no demarcation between training and testing data and prior target speaker models are absent. Given the ENRON database which consists of tapped telephone conversations between traders and customers, the task is to identify conversations having one or more speakers in common. Since the poor audio quality of this database makes automatic speaker segmentation ineffective, a new technique was developed where a multi-speaker model is trained on the entire conversation and various scoring strategies were tried. A semi-automatic approach was adopted and it reduces the manual effort involved in speaker mining by 68%.

Full Paper

Bibliographic reference.  Manocha, Sandeep / Espy-Wilson, Carol Y. (2007): "A semi-automatic approach for speaker mining of tapped telephone conversations", In INTERSPEECH-2007, 2009-2012.