ESCA Workshop on Automatic Speaker Recognition, Identification, and Verification
Speech and speaker recognition were examined using Self-Organizing Feature Maps (SOFMs) and three different representations of speech - traditional Mel-Cepstral Coefficients (MCC) and the integrated outputs of two different models of the auditory periphery: the Auditory Image Model (AIM) of Patterson and Payton's auditory model (PAM). AIM is a functional model of human hearing up to the level of our initial experience of a sound, that is, our 'auditory image' of the sound. PAM is a neurophysiologically based model of the auditory periphery. In the current experiments, the input vectors for the recognizer were based on the neural activity patterns flowing from the cochlear simulations of AIM and PAM. The phoneme recognition results are based on the 39 phoneme classes of K.F. Lee. The results showed that the auditory models supported better recognition accuracy than MCC using the training and test sets from dialect regions 1 and 2 of the TIMIT database ( 1140 sentences from 114 speakers for training, 370 sentences from 37 speakers for testing). The two representations made different types of phoneme recognition errors. Speaker recognition experiments using these same representations showed that AIM provided results comparable to that of MCC. PAM did not perform as well.
Bibliographic reference. Anderson, Timothy R. / Patterson, Roy D. (1994): "Speaker recognition with the auditory image model and self-organizing feature maps: a comparison with traditional techniques", In ASRIV-1994, 153-156.