ISCA Archive Odyssey 2016
ISCA Archive Odyssey 2016

Short- and Long-Term Speech Features for Hybrid HMM-i-Vector based Speaker Diarization System

Abraham Woubie Zewoudie, Jordi Luque, Javier Hernando

i-vectors have been successfully applied over the last years in speaker recognition tasks. This work aims at assessing the suitability of i-vector modeling within the frame of speaker diarization task. In such context, a weighted cosine-distance between two different sets of i-vectors is proposed for speaker clustering. Speech clusters generated by Viterbi segmentation are first modeled by two different i-vectors. Whilst the first i-vector represents the distribution of the commonly used short-term Mel Frequency Cepstral Coefficients, the second one depicts a selection of voice quality and prosodic features. In order to combine both short- and long-term speech statistics, the cosine-distance scores of those two i-vectors are linearly weighted to obtain a unique similarity score. The final fused score is then used as speaker clustering distance. Our experimental results on two different evaluation sets of the Augmented Multi-party Interaction corpus show the suitability of combining both sources of information within the i-vector space. Our experimental results show that the use of i-vector based clustering technique provide a significant improvement, in terms of diarization error rate, than those based on Gaussian Mixture Modeling technique. Furthermore, this work also reports a significant speaker error reduction by augmenting short-term based i-vector clustering with a second i-vector estimated from voice quality and prosody related speech features.


doi: 10.21437/Odyssey.2016-58

Cite as: Zewoudie, A.W., Luque, J., Hernando, J. (2016) Short- and Long-Term Speech Features for Hybrid HMM-i-Vector based Speaker Diarization System. Proc. The Speaker and Language Recognition Workshop (Odyssey 2016), 400-406, doi: 10.21437/Odyssey.2016-58

@inproceedings{zewoudie16_odyssey,
  author={Abraham Woubie Zewoudie and Jordi Luque and Javier Hernando},
  title={{Short- and Long-Term Speech Features for Hybrid HMM-i-Vector based Speaker Diarization System}},
  year=2016,
  booktitle={Proc. The Speaker and Language Recognition Workshop (Odyssey 2016)},
  pages={400--406},
  doi={10.21437/Odyssey.2016-58}
}