In this paper a general strategy towards robust and efficient speaker recognition is presented. Emphasis is placed on comparing the usefulness of different features calculated from the speech signal at different temporal and spectral resolutions. Specifically, three spectral features are evaluated in a neural network environment: linear frequency loudness scaled spectra, auditory spectra from an auditory model, and the lattice coefficients from a warped linear predictor. These features are tested with four different neural network topologies ranging from speaker identification to verification configurations. Variations in the neural net dimensions are also performed to gain an understanding of the complexity of the problem. The tests are based on 40 minutes of speech recorded from a set of 20 native Estonian speakers.
Bibliographic reference. Altosaar, Toomas / Meister, Einar (1995): "Speaker recognition experiments in Estonian using multi-layer feed-forward neural nets", In EUROSPEECH-1995, 333-336.