In this paper, we first reformulate the derivation of the conventional i-vector scheme, which is the state-of-the-art utterance representation for speaker verification, as a modeling of universal background model (UBM)-based mixtures of factor analyzers (UMFA), and then propose a clustering-based UMFA method called CMFA. In UMFA, each analyzer is characterized by a subspace, and the same projection coordinate of an utterance into individual subspaces is called the i-vector. We relax this assumption by grouping the mixture components of the UBM into clusters according to their acoustic traits. Therefore, in CMFA, each utterance is represented by multiple i-vectors, each of which generated by similar subspaces associated with a same cluster. We also investigate two strategies for merging these i-vectors into a single one to be applied in the classifier of the conventional i-vector framework. The results of experiments conducted on the male portion of the core task in the NIST 2005 Speaker Recognition Evaluation (SRE) in terms of normalized decision cost function (minDCF) and equal error rate (EER) demonstrate the merits of the new i-vector method over the conventional i-vector method.
Bibliographic reference. Lee, Hung-Shin / Tsao, Yu / Wang, Hsin-Min / Jeng, Shyh-Kang (2014): "Clustering-based i-vector formulation for speaker recognition", In INTERSPEECH-2014, 1101-1105.