In text-independent speaker verification, it has been shown effective to represent the variable-length and information rich speech utterances using fixed-dimensional vectors, for instance, in the form of i-vectors. An i-vector is a low-dimensional vector in the so-called total variability space represented with a thin and tall rectangular matrix. Taking each row of the total variability matrix as a random vector, we look into the redundancy in representing the total variability space. We show that the total variability matrix is compressible and such characteristic could be exploited to reduce the memory and computational requirement in i-vector extraction. We also show that the existing sparse coding and dictionary learning techniques could be easily adapted for this purpose. Experiments on NIST SRE'10 dataset confirm that the total variability matrix could be represented with a smaller matrix without affecting the performance.
Bibliographic reference. Xu, Longting / Lee, Kong Aik / Li, Haizhou / Yang, Zhen (2015): "Sparse coding of total variability matrix", In INTERSPEECH-2015, 1022-1026.