ESCA Workshop on Automatic Speaker Recognition, Identification, and Verification
This paper presents initial results on imposter detection in telephone speech. The imposter detector problem is defined in terms of a real-world security problem. Perceptual studies are then presented. These studies present a good estimate on the difficulty of the task at hand; it is found that humans classify approximately 85.6% of our benchmark utterances correctly. To design an automatic imposter detector, features which elicit speaker differences are studied. A baseline system based only on 20'th order Linear Predictive Coefficients (LPC) classifies 75.0% of the test set correctly. By extracting features only in vowel and semi-vowel regions, i.e. where the all-pole model of the linear predictor is most accurate, the classification performance is increased to 80.0%. Further features such as average energy and median pitch result in a correct classification rate of 83.7%, comparable to the perceptual benchmarks. Results are also presented for Mandarin, Japanese and Spanish.
Bibliographic reference. Schalkwyk, Johan / Barnard, Etienne / Cole, Ronald A. / Sachs, Jeffrey R. (1994): "Detecting an imposter in telephone speech", In ASRIV-1994, 119-122.