8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Robust Speaker Identification Based on Perceptual Log Area Ratio and Gaussian Mixture Models

David Chow, Waleed Abdulla

The University of Auckland, New Zealand

This paper presents a new feature for speaker identification called perceptual log area ratio (PLAR). PLAR is closely related to the log area ratio (LAR) feature. PLAR is derived from the perceptual linear prediction (PLP) rather than the linear predictive coding (LPC). The PLAR feature derived from PLP is more robust to noise than the LAR feature. In this paper, PLAR, LAR and MFCC features were tested in a Gaussian mixture model (GMM) based speaker identification system. The F-ratio feature analysis showed that the lower order PLAR and LAR coefficients are superior in classification performance to their MFCC counterparts. The text-independent, closed-set speaker identification accuracies, as tested on KING, YOHO and the down-sampled version of TIMIT databases were 85.29%, 97.045%, 98.81% using PLAR, 61.76%, 94.76%, 97.92% using LAR and 84.31%, 96.48%, 96.73% using MFCC. Those results showed that PLAR is better than LAR and MFCC in both clean and noisy environments.

Full Paper

Bibliographic reference.  Chow, David / Abdulla, Waleed (2004): "Robust speaker identification based on perceptual log area ratio and Gaussian mixture models", In INTERSPEECH-2004, 1761-1764.