ISCA Archive ICSLP 2000
ISCA Archive ICSLP 2000

Speaker recognition with recurrent neural networks

Shahla Parveen, Abdul Qadeer, Phil Green

We report on the application of recurrent neural nets in a openset text-dependent speaker identification task. The motivation for applying recurrent neural nets to this domain is to find out if their ability to take short-term spectral features but yet respond to long-term temporal events is advantageous for speaker identification.

We use a feedforward net architecture adapted from that introduced by Robinson We introduce a fully-connected hidden layer between the input and state nodes and the output. We show that this hidden layer makes the learning of complex classification tasks more efficient. Training uses back propagation through time. There is one output unit per speaker, with the training targets corresponding to speaker identity.

For 12 speakers (a mixture of male and female) we obtain a true acceptance rate 100% with a false acceptance rate 4%. For 16 speakers these figures are 94% and 7% respectively. We also investigate the sensitivity of identification accuracy to environmental factors (signal level, change of microphone and band limitation), choice of acoustic vectors (FFT, LPC or Cepstral), distribution of speakers in the training database, inclusion of fundamental frequency. FFT features plus fundamental frequency give the best results.

This performance is shown to compare favorably with studies reported on similar tasks with Hidden Markov Model technique.

Cite as: Parveen, S., Qadeer, A., Green, P. (2000) Speaker recognition with recurrent neural networks. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 2, 306-309

  author={Shahla Parveen and Abdul Qadeer and Phil Green},
  title={{Speaker recognition with recurrent neural networks}},
  booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)},
  pages={vol. 2, 306-309}