INTERSPEECH 2014
15th Annual Conference of the International Speech Communication Association

Singapore
September 14-18, 2014

Tandem Deep Features for Text-Dependent Speaker Verification

Tianfan Fu, Yanmin Qian, Yuan Liu, Kai Yu

Shanghai Jiao Tong University, China

Although deep learning has been successfully used in acoustic modeling of speech recognition, it has not been thoroughly investigated and widely accepted for speaker verification. This paper describes an investigation of using various types of deep features in a Tandem fashion for text-dependent speaker verification. Three types of networks are used to extract deep features: restricted Boltzmann machine (RBM), phone discriminant and speaker discriminant deep neural network (DNN). Hidden layer outputs from these networks are concatenated with the original acoustic features and used in a GMM-UBM classifier. The systems with Tandem deep feature were evaluated on RSR2015, a short-term text dependent speaker verification task. Experiments showed that the best Tandem deep feature obtained more than 50% relative EER reduction over the traditional feature in a GMM-UBM framework.

Full Paper

Bibliographic reference.  Fu, Tianfan / Qian, Yanmin / Liu, Yuan / Yu, Kai (2014): "Tandem deep features for text-dependent speaker verification", In INTERSPEECH-2014, 1327-1331.