Conventional approaches to speaker diarization use short-term features such asMel Frequency Cepstral Co-efficients (MFCC). Features such as i-vectors have been used on longer segments (minimum 2.5 seconds of speech). Using i-vectors for speaker diarization has been shown to be beneficial as it models speaker information explicitly. In this paper, the i-vector modelling technique is adapted to be used as short term features for diarization by estimating i-vectors over a short window of MFCCs. The Information Bottleneck (IB) approach provides a convenient platform to integrate multiple features together for fast and accurate diarization of speech. Speaker models are estimated over a window of 10 frames of speech and used as features in the IB system. Experiments on the NIST RT datasets show an absolute improvement of 3.9% in the best case when i-vectors are used as auxiliary features to MFCCs. Further, discriminative training algorithms such as LDA and PLDA are applied on the i-vectors. A best case performance improvement of 5% in absolute terms is obtained on the RT datasets.
Cite as: Madikeri, S., Himawan, I., Motlicek, P., Ferras, M. (2015) Integrating online i-vector extractor with information bottleneck based speaker diarization system. Proc. Interspeech 2015, 3105-3109, doi: 10.21437/Interspeech.2015-111
@inproceedings{madikeri15_interspeech, author={Srikanth Madikeri and Ivan Himawan and Petr Motlicek and Marc Ferras}, title={{Integrating online i-vector extractor with information bottleneck based speaker diarization system}}, year=2015, booktitle={Proc. Interspeech 2015}, pages={3105--3109}, doi={10.21437/Interspeech.2015-111} }