This paper examines inter-speaker variability of perceptually weighted features known as PLP. The motivation is to find a successful transformation between speakers for use in adaptation in speech recognition. Weighted cepstral distance measures are examined, including a combination of the unweighted d-CEP and the root-power-sum slope distortion measure d-RPS This is shown to be most effective in speaker-independent ASR. It is found that differences between two speakers are exhibited relatively clearly and consistently on the PLP/RPS domain. The attenuation of these differences by a linear transformation forms the basis of the proposed adaptation method for speech recognition. Recognition experiments indicate clearly the effectiveness of the method.
Bibliographic reference. Yong, Gu / Mason, John S. (1989): "Speaker normalization via a linear transformation on a perceptual feature space and its benefits in ASR adaptation", In EUROSPEECH-1989, 1258-1261.