16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

Speaker Diarization with I-Vectors from DNN Senone Posteriors

Gregory Sell, Daniel Garcia-Romero, Alan McCree

Johns Hopkins University, USA

Motivated by recent gains in speaker identification by incorporating senone posteriors from deep neural networks (DNNs) into i-vector extraction, we examine similar enhancements to speaker diarization with i-vector clustering. We examine two DNNs with different numbers of senone targets in combination with a diagonal or full covariance universal background model (UBM) in the context of the multilingual corpus CALLHOME. Results show that the larger DNN with a full covariance UBM gives the best performance. The improvements appear to have a strong dependence on number of speakers in a conversation, and a lesser dependence on language. Overall, when combined with resegmentation, the proposed system improves CALLHOME performance to 10.3% DER.

Full Paper

Bibliographic reference.  Sell, Gregory / Garcia-Romero, Daniel / McCree, Alan (2015): "Speaker diarization with i-vectors from DNN senone posteriors", In INTERSPEECH-2015, 3096-3099.