Fully Automatic Speaker Separation System, with Automatic Enrolling of Recurrent Speakers

Raphael Cohen, Orgad Keller, Jason Levy, Russell Levy, Micha Breakstone, Amit Ashkenazi


We present a system to enable speaker separation and identification, designed to operate without requiring any effort from the end-user. In the system, single channel conversations are transformed into i-vectors, clustered into speakers and matched to a database of known speakers. Enrollment is automatic and a voice print is constructed for the recording user, taking advantage of the meta-data identifying that user's conversations. Further information is used when available from other information sources such as video and the ASR transcribed content to identify speakers. We describe the system architecture, novel unsupervised enrollment algorithm and describe the difficulties encountered in solving this problem.


Cite as: Cohen, R., Keller, O., Levy, J., Levy, R., Breakstone, M., Ashkenazi, A. (2018) Fully Automatic Speaker Separation System, with Automatic Enrolling of Recurrent Speakers. Proc. Interspeech 2018, 1964-1965.


@inproceedings{Cohen2018,
  author={Raphael Cohen and Orgad Keller and Jason Levy and Russell Levy and Micha Breakstone and Amit Ashkenazi},
  title={Fully Automatic Speaker Separation System, with Automatic Enrolling of Recurrent Speakers},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={1964--1965}
}