ISCA Archive Odyssey 2014
ISCA Archive Odyssey 2014

Deep Neural Networks for extracting Baum-Welch statistics for Speaker Recognition

Patrick Kenny, Themos Stafylakis, Pierre Ouellet, Vishwa Gupta, Jahangir Alam

We examine the use of Deep Neural Networks (DNN) in extracting Baum-Welch statistics for i-vector-based text-independent speaker recognition. Instead of training the universal background model using the standard EM algorithm, the components are predefined and correspond to the set of triphone states, the posterior occupancy probabilities of which are modeled by a DNN. Those assignments are then combined with the standard 60-dim MFCC features to calculate first order Baum-Welch statistics in order to train the i-vector extractor and extract i-vectors. The DNN-based assignment force the i-vectors to capture the idiosyncratic way in which each speaker pronounces each particular triphone state, which can enrich the standard short-term spectral representation of the standard i-vectors. After experimenting with Switchboard data and a baseline PLDA classifier, our results showed that although the proposed i-vectors yield inferior performance compared to the standard ones, they are capable of attaining 16% relative improvement when fused with them, meaning that they carry useful complementary information about the speaker's identity. A further experiment with a different DNN configuration attained comparable performance with the baseline i-vectors on NIST 2012 (condition C2, female).


doi: 10.21437/Odyssey.2014-44

Cite as: Kenny, P., Stafylakis, T., Ouellet, P., Gupta, V., Alam, J. (2014) Deep Neural Networks for extracting Baum-Welch statistics for Speaker Recognition. Proc. The Speaker and Language Recognition Workshop (Odyssey 2014), 293-298, doi: 10.21437/Odyssey.2014-44

@inproceedings{kenny14c_odyssey,
  author={Patrick Kenny and Themos Stafylakis and Pierre Ouellet and Vishwa Gupta and Jahangir Alam},
  title={{Deep Neural Networks for extracting Baum-Welch statistics for Speaker Recognition}},
  year=2014,
  booktitle={Proc. The Speaker and Language Recognition Workshop (Odyssey 2014)},
  pages={293--298},
  doi={10.21437/Odyssey.2014-44}
}