Motivated by recent gains in speaker identification by incorporating senone posteriors from deep neural networks (DNNs) into i-vector extraction, we examine similar enhancements to speaker diarization with i-vector clustering. We examine two DNNs with different numbers of senone targets in combination with a diagonal or full covariance universal background model (UBM) in the context of the multilingual corpus CALLHOME. Results show that the larger DNN with a full covariance UBM gives the best performance. The improvements appear to have a strong dependence on number of speakers in a conversation, and a lesser dependence on language. Overall, when combined with resegmentation, the proposed system improves CALLHOME performance to 10.3% DER.
Cite as: Sell, G., Garcia-Romero, D., McCree, A. (2015) Speaker diarization with i-vectors from DNN senone posteriors. Proc. Interspeech 2015, 3096-3099, doi: 10.21437/Interspeech.2015-109
@inproceedings{sell15_interspeech, author={Gregory Sell and Daniel Garcia-Romero and Alan McCree}, title={{Speaker diarization with i-vectors from DNN senone posteriors}}, year=2015, booktitle={Proc. Interspeech 2015}, pages={3096--3099}, doi={10.21437/Interspeech.2015-109} }