15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Clustering-Based i-Vector Formulation for Speaker Recognition

Hung-Shin Lee (1), Yu Tsao (2), Hsin-Min Wang (2), Shyh-Kang Jeng (1)

(1) National Taiwan University, Taiwan
(2) Academia Sinica, Taiwan

In this paper, we first reformulate the derivation of the conventional i-vector scheme, which is the state-of-the-art utterance representation for speaker verification, as a modeling of universal background model (UBM)-based mixtures of factor analyzers (UMFA), and then propose a clustering-based UMFA method called CMFA. In UMFA, each analyzer is characterized by a subspace, and the same projection coordinate of an utterance into individual subspaces is called the i-vector. We relax this assumption by grouping the mixture components of the UBM into clusters according to their acoustic traits. Therefore, in CMFA, each utterance is represented by multiple i-vectors, each of which generated by similar subspaces associated with a same cluster. We also investigate two strategies for merging these i-vectors into a single one to be applied in the classifier of the conventional i-vector framework. The results of experiments conducted on the male portion of the core task in the NIST 2005 Speaker Recognition Evaluation (SRE) in terms of normalized decision cost function (minDCF) and equal error rate (EER) demonstrate the merits of the new i-vector method over the conventional i-vector method.

Full Paper

Bibliographic reference.  Lee, Hung-Shin / Tsao, Yu / Wang, Hsin-Min / Jeng, Shyh-Kang (2014): "Clustering-based i-vector formulation for speaker recognition", In INTERSPEECH-2014, 1101-1105.