ITRW on Non-Linear Speech Processing (NOLISP 05)

Barcelona, Spain
April 19-22, 2005

MLP Internal Representation as Discriminative Features for Improved Speaker Recognition

Dalei Wu, Andrew C. Morris, Jacques Koreman

Institute of Phonetics, Saarland University, Saarbrücken, Germany

Feature projection by non-linear discriminant analysis (NLDA) can substantially increase classification performance. In automatic speech recognition (ASR) the projection provided by the pre-squashed outputs from a one hidden layer multi-layer perceptron (MLP) trained to recognise speech subunits (phonemes) has previously been shown to significantly increase ASR performance. An analogous approach cannot be applied directly to speaker recognition because there is no recognised set of "speaker sub-units" to provide a finite set of MLP target classes, and it for many applications it is not practical to train an MLP with one output for each target speaker. In this paper we show that the output from the second hidden layer of an MLP with three hidden layers, trained to identify a subset of 100 speakers selected at random from the full set of 630 speakers in Timit, can provide a 77% relative error reduction for common Gaussian mixture model (GMM) based speaker identification.

Full Paper

Bibliographic reference.  Wu, Dalei / Morris, Andrew C. / Koreman, Jacques (2005): "MLP internal representation as discriminative features for improved speaker recognition", In NOLISP-2005, 25-34.