We use a multi-layer perceptron (MLP) to transform cepstral features into features better suited for speaker recognition. Two types of MLP output targets are considered: phones (Tandem/HATS-MLP) and speakers (Speaker-MLP). In the former case, output activations are used as features in a GMM speaker recognition system, while for the latter, hidden activations are used as features in an SVM system. Using a smaller set of MLP training speakers, chosen through clustering, yields system performance similar to that of a Speaker-MLP trained with many more speakers. For the NIST Speaker Recognition Evaluation 2004, both Tandem/HATS-GMM and Speaker-SVM systems improve upon a basic GMM baseline, but are unable to contribute in a score-level combination with a state-of-the-art GMM system. It may be that the application of normalizations and channel compensation techniques to the current state-ofthe- art GMM has reduced channel mismatch errors to the point that contributions of the MLP systems are no longer additive.
Cite as: Stoll, L., Frankel, J., Mirghafori, N. (2007) Speaker recognition via nonlinear discriminant features. Proc. ITRW on Nonlinear Speech Processing (NOLISP 2007), 27-30
@inproceedings{stoll07_nolisp, author={Lara Stoll and Joe Frankel and Nikki Mirghafori}, title={{Speaker recognition via nonlinear discriminant features}}, year=2007, booktitle={Proc. ITRW on Nonlinear Speech Processing (NOLISP 2007)}, pages={27--30} }