INTERSPEECH 2004 - ICSLP
This paper presents a new feature for speaker identification called perceptual log area ratio (PLAR). PLAR is closely related to the log area ratio (LAR) feature. PLAR is derived from the perceptual linear prediction (PLP) rather than the linear predictive coding (LPC). The PLAR feature derived from PLP is more robust to noise than the LAR feature. In this paper, PLAR, LAR and MFCC features were tested in a Gaussian mixture model (GMM) based speaker identification system. The F-ratio feature analysis showed that the lower order PLAR and LAR coefficients are superior in classification performance to their MFCC counterparts. The text-independent, closed-set speaker identification accuracies, as tested on KING, YOHO and the down-sampled version of TIMIT databases were 85.29%, 97.045%, 98.81% using PLAR, 61.76%, 94.76%, 97.92% using LAR and 84.31%, 96.48%, 96.73% using MFCC. Those results showed that PLAR is better than LAR and MFCC in both clean and noisy environments.
Bibliographic reference. Chow, David / Abdulla, Waleed (2004): "Robust speaker identification based on perceptual log area ratio and Gaussian mixture models", In INTERSPEECH-2004, 1761-1764.