Incorporating Local Acoustic Variability Information into Short Duration Speaker Verification

Jianbo Ma, Vidhyasaharan Sethu, Eliathamby Ambikairajah, Kong Aik Lee


State-of-the-art speaker verification systems are based on the total variability model to compactly represent the acoustic space. However, short duration utterances only contain limited phonetic content, potentially resulting in an incomplete representation being captured by the total variability model thus leading to poor speaker verification performance. In this paper, a technique to incorporate component-wise local acoustic variability information into the speaker verification framework is proposed. Specifically, Gaussian Probabilistic Linear Discriminant Analysis (G-PLDA) of the supervector space, with a block diagonal covariance assumption, is used in conjunction with the traditional total variability model. Experimental results obtained using the NIST SRE 2010 dataset show that the incorporation of the proposed method leads to relative improvements of 20.48% and 18.99% in the 3 second condition for male and female speech respectively.


 DOI: 10.21437/Interspeech.2017-266

Cite as: Ma, J., Sethu, V., Ambikairajah, E., Lee, K.A. (2017) Incorporating Local Acoustic Variability Information into Short Duration Speaker Verification. Proc. Interspeech 2017, 1502-1506, DOI: 10.21437/Interspeech.2017-266.


@inproceedings{Ma2017,
  author={Jianbo Ma and Vidhyasaharan Sethu and Eliathamby Ambikairajah and Kong Aik Lee},
  title={Incorporating Local Acoustic Variability Information into Short Duration Speaker Verification},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={1502--1506},
  doi={10.21437/Interspeech.2017-266},
  url={http://dx.doi.org/10.21437/Interspeech.2017-266}
}