It is recently reported in [l][2] that perceptually-based linear prediction, PLP, features achieve significantly better speaker recognition results than when using standard LPC features. The superiority of the PLP model is attributed to a series of perceptually-based spectral transforms, applied prior to deriving feature sequences from the standard linear prediction process. This paper investigates further the use of PLP features in speaker identification, focusing on the contributions of each of the perceptual factors. PLP, as proposed originally by Hermansky [3] was optimised for speech recognition. This paper demonstrates that, not surprisingly, different optimum conditions apply for speaker recognition. In particular we show the distinct benefit of increasing the number of critical bands (from the original 17 up to 64). The increased spectral detail is clearly important in this task, and ASI experiments based on 1000 single-digit tests in digit-independent codebook scheme gives a 2. 7% error rate for the modified PLP method, compared with 4. 7% and 6. 5% when using the original PLP and standard LPC models respectively. Furthermore, it is found that all the perceptual weightings considered in the PLP model to some extent enhance the performance, and in agreement with Gu's findings in speech recognition [4], ASI performance is shown to be relatively insensitive to the precise masking pattern.
Cite as: Xu, L., Mason, J.S. (1991) Optimization of perceptually-based spectral transforms in speaker identification. Proc. 2nd European Conference on Speech Communication and Technology (Eurospeech 1991), 439-442, doi: 10.21437/Eurospeech.1991-111
@inproceedings{xu91_eurospeech, author={L. Xu and J. S. Mason}, title={{Optimization of perceptually-based spectral transforms in speaker identification}}, year=1991, booktitle={Proc. 2nd European Conference on Speech Communication and Technology (Eurospeech 1991)}, pages={439--442}, doi={10.21437/Eurospeech.1991-111} }