ISCA Archive NOLISP 2007
ISCA Archive NOLISP 2007

Speaker recognition via nonlinear discriminant features

Lara Stoll, Joe Frankel, Nikki Mirghafori

We use a multi-layer perceptron (MLP) to transform cepstral features into features better suited for speaker recognition. Two types of MLP output targets are considered: phones (Tandem/HATS-MLP) and speakers (Speaker-MLP). In the former case, output activations are used as features in a GMM speaker recognition system, while for the latter, hidden activations are used as features in an SVM system. Using a smaller set of MLP training speakers, chosen through clustering, yields system performance similar to that of a Speaker-MLP trained with many more speakers. For the NIST Speaker Recognition Evaluation 2004, both Tandem/HATS-GMM and Speaker-SVM systems improve upon a basic GMM baseline, but are unable to contribute in a score-level combination with a state-of-the-art GMM system. It may be that the application of normalizations and channel compensation techniques to the current state-ofthe- art GMM has reduced channel mismatch errors to the point that contributions of the MLP systems are no longer additive.


Cite as: Stoll, L., Frankel, J., Mirghafori, N. (2007) Speaker recognition via nonlinear discriminant features. Proc. ITRW on Nonlinear Speech Processing (NOLISP 2007), 27-30

@inproceedings{stoll07_nolisp,
  author={Lara Stoll and Joe Frankel and Nikki Mirghafori},
  title={{Speaker recognition via nonlinear discriminant features}},
  year=2007,
  booktitle={Proc. ITRW on Nonlinear Speech Processing (NOLISP 2007)},
  pages={27--30}
}