ISCA Archive Odyssey 2012
ISCA Archive Odyssey 2012

Utterance partitioning with acoustic vector resampling for i-vector based speaker verification

Wei Rao, Man-Wai Mak

I-vector has become a state-of-the-art technique for text-independent speaker verification. The major advantage of i-vectors is that they can represent speaker-dependent information in a low-dimension Euclidean space, which opens up opportunity for using statistical techniques to suppress sessionand channel-variability. This paper investigates the effect of varying the conversation length and the number of training sessions per speakers on the discriminative ability of i-vectors. The paper demonstrates that the amount of speaker-dependent information that an i-vector can capture will become saturated when the utterance length exceeds a certain threshold. This finding motivates us to maximize the feature representation capability of i-vectors by partitioning a long conversation into a number of sub-utterances in order to produce more i-vectors per conversation. Results on NIST 2010 SRE suggest that (1) using more i-vectors per conversation enhances the capability of LDA and WCCN in suppressing session variability, especially when the number of conversations per training speaker is limited; and (2) increasing the number of i-vectors per target speaker helps the i-vector based SVMs to find better decision boundaries, thus making SVM scoring outperforms cosine distance scoring by 22% and 9% in terms of minimum normalized DCF and EER.

Index Terms: speaker verification, i-vectors, utterance partitioning, support vector machines.


Cite as: Rao, W., Mak, M.-W. (2012) Utterance partitioning with acoustic vector resampling for i-vector based speaker verification. Proc. The Speaker and Language Recognition Workshop (Odyssey 2012), 165-171

@inproceedings{rao12_odyssey,
  author={Wei Rao and Man-Wai Mak},
  title={{Utterance partitioning with acoustic vector resampling for i-vector based speaker verification}},
  year=2012,
  booktitle={Proc. The Speaker and Language Recognition Workshop (Odyssey 2012)},
  pages={165--171}
}