Improving i-Vector and PLDA Based Speaker Clustering with Long-Term Features

Abraham Woubie, Jordi Luque, Javier Hernando


i-vector modeling techniques have been successfully used for speaker clustering task recently. In this work, we propose the extraction of i-vectors from short- and long-term speech features, and the fusion of their PLDA scores within the frame of speaker diarization. Two sets of i-vectors are first extracted from short-term spectral and long-term voice-quality, prosodic and glottal to noise excitation ratio (GNE) features. Then, the PLDA scores of these two i-vectors are fused for speaker clustering task. Experiments have been carried out on single and multiple site scenario test sets of Augmented Multi-party Interaction (AMI) corpus. Experimental results show that i-vector based PLDA speaker clustering technique provides a significant diarization error rate (DER) improvement than GMM based BIC clustering technique.


DOI: 10.21437/Interspeech.2016-339

Cite as

Woubie, A., Luque, J., Hernando, J. (2016) Improving i-Vector and PLDA Based Speaker Clustering with Long-Term Features. Proc. Interspeech 2016, 372-376.

Bibtex
@inproceedings{Woubie+2016,
author={Abraham Woubie and Jordi Luque and Javier Hernando},
title={Improving i-Vector and PLDA Based Speaker Clustering with Long-Term Features},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-339},
url={http://dx.doi.org/10.21437/Interspeech.2016-339},
pages={372--376}
}