ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Speaker diarization with i-vectors from DNN senone posteriors

Gregory Sell, Daniel Garcia-Romero, Alan McCree

Motivated by recent gains in speaker identification by incorporating senone posteriors from deep neural networks (DNNs) into i-vector extraction, we examine similar enhancements to speaker diarization with i-vector clustering. We examine two DNNs with different numbers of senone targets in combination with a diagonal or full covariance universal background model (UBM) in the context of the multilingual corpus CALLHOME. Results show that the larger DNN with a full covariance UBM gives the best performance. The improvements appear to have a strong dependence on number of speakers in a conversation, and a lesser dependence on language. Overall, when combined with resegmentation, the proposed system improves CALLHOME performance to 10.3% DER.


doi: 10.21437/Interspeech.2015-109

Cite as: Sell, G., Garcia-Romero, D., McCree, A. (2015) Speaker diarization with i-vectors from DNN senone posteriors. Proc. Interspeech 2015, 3096-3099, doi: 10.21437/Interspeech.2015-109

@inproceedings{sell15_interspeech,
  author={Gregory Sell and Daniel Garcia-Romero and Alan McCree},
  title={{Speaker diarization with i-vectors from DNN senone posteriors}},
  year=2015,
  booktitle={Proc. Interspeech 2015},
  pages={3096--3099},
  doi={10.21437/Interspeech.2015-109}
}