We propose a novel generative approach to speaker recognition using Boltzmann machines, a fledgeling non-Gaussian probabilistic framework that is increasingly gaining attention in several machine learning fields. We show how a modified i-vector representation of speech utterances enables the development of several Boltzmann machine architectures for speaker verification and we report some preliminary speaker recognition results obtained with one of them, which we refer to as Siamese twins. The Siamese twin architecture is designed to capture correlations between utterances spoken by a single speaker and it can be regarded as probabilistic analogue of the well known cosine distance metric. A relative improvement of 27% is reported on NIST-2010 telephone female data.
Cite as: Stafylakis, T., Kenny, P., Senoussaoui, M., Dumouchel, P. (2012) Preliminary investigation of Boltzmann machine classifiers for speaker recognition. Proc. The Speaker and Language Recognition Workshop (Odyssey 2012), 109-116
@inproceedings{stafylakis12_odyssey, author={Themos Stafylakis and Patrick Kenny and Mohammed Senoussaoui and Pierre Dumouchel}, title={{Preliminary investigation of Boltzmann machine classifiers for speaker recognition}}, year=2012, booktitle={Proc. The Speaker and Language Recognition Workshop (Odyssey 2012)}, pages={109--116} }