This paper presents initial results on imposter detection in telephone speech. The imposter detector problem is defined in terms of a real-world security problem. Perceptual studies are then presented. These studies present a good estimate on the difficulty of the task at hand; it is found that humans classify approximately 85.6% of our benchmark utterances correctly. To design an automatic imposter detector, features which elicit speaker differences are studied. A baseline system based only on 20'th order Linear Predictive Coefficients (LPC) classifies 75.0% of the test set correctly. By extracting features only in vowel and semi-vowel regions, i.e. where the all-pole model of the linear predictor is most accurate, the classification performance is increased to 80.0%. Further features such as average energy and median pitch result in a correct classification rate of 83.7%, comparable to the perceptual benchmarks. Results are also presented for Mandarin, Japanese and Spanish.
Cite as: Schalkwyk, J., Barnard, E., Cole, R.A., Sachs, J.R. (1994) Detecting an imposter in telephone speech. Proc. ESCA Workshop on Automatic Speaker Recognition, Identification and Verification, 119-122
@inproceedings{schalkwyk94_asriv, author={Johan Schalkwyk and Etienne Barnard and Ronald A. Cole and Jeffrey R. Sachs}, title={{Detecting an imposter in telephone speech}}, year=1994, booktitle={Proc. ESCA Workshop on Automatic Speaker Recognition, Identification and Verification}, pages={119--122} }