Supervised I-vector Modeling - Theory and Applications

Shreyas Ramoji, Sriram Ganapathy


Over the last decade, the factor analysis based modeling of a variable length speech utterance into a fixed dimensional vector (termed as i-vector) has been prominently used for many tasks like speaker recognition, language recognition and even in speech recognition. The i-vector model is an unsupervised learning paradigm where the data is initially clustered using a Gaussian Mixture Universal Background Model (GMM-UBM). The adapted means of the Gaussian mixture components are dimensionality reduced using the Total Variability Matrix (TVM) where the latent variables are modeled with a single Gaussian distribution. In this paper, we propose to rework the theory of i-vector modeling using a supervised framework where the speech utterances are associated with a label. Class labels are introduced in the i-vector model using a mixture Gaussian prior. We show that the proposed model is a generalized i-vector model and the conventional i-vector model turns out to be a special case of this model. This model is applied for a language recognition task using the NIST Language Recognition Evaluation (LRE) 2017 dataset. In these experiments, the supervised i-vector model provides significant improvements over the conventional i-vector model (average relative improvements of 5% in terms of C_{avg}.


 DOI: 10.21437/Interspeech.2018-2012

Cite as: Ramoji, S., Ganapathy, S. (2018) Supervised I-vector Modeling - Theory and Applications. Proc. Interspeech 2018, 1091-1095, DOI: 10.21437/Interspeech.2018-2012.


@inproceedings{Ramoji2018,
  author={Shreyas Ramoji and Sriram Ganapathy},
  title={Supervised I-vector Modeling - Theory and Applications},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={1091--1095},
  doi={10.21437/Interspeech.2018-2012},
  url={http://dx.doi.org/10.21437/Interspeech.2018-2012}
}