Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Speaker Recognition with Recurrent Neural Networks

Shahla Parveen (1,2), Abdul Qadeer (2), Phil Green (1)

(1) Speech and Hearing Research Group, Department of Computer Science, University of Sheffield, UK
(2) Department of Applied Physics, University of Karachi, Pakistan

We report on the application of recurrent neural nets in a openset text-dependent speaker identification task. The motivation for applying recurrent neural nets to this domain is to find out if their ability to take short-term spectral features but yet respond to long-term temporal events is advantageous for speaker identification.

We use a feedforward net architecture adapted from that introduced by Robinson et.al. We introduce a fully-connected hidden layer between the input and state nodes and the output. We show that this hidden layer makes the learning of complex classification tasks more efficient. Training uses back propagation through time. There is one output unit per speaker, with the training targets corresponding to speaker identity.

For 12 speakers (a mixture of male and female) we obtain a true acceptance rate 100% with a false acceptance rate 4%. For 16 speakers these figures are 94% and 7% respectively. We also investigate the sensitivity of identification accuracy to environmental factors (signal level, change of microphone and band limitation), choice of acoustic vectors (FFT, LPC or Cepstral), distribution of speakers in the training database, inclusion of fundamental frequency. FFT features plus fundamental frequency give the best results.

This performance is shown to compare favorably with studies reported on similar tasks with Hidden Markov Model technique.


Full Paper

Bibliographic reference.  Parveen, Shahla / Qadeer, Abdul / Green, Phil (2000): "Speaker recognition with recurrent neural networks", In ICSLP-2000, vol.2, 306-309.