ODYSSEY 2004 - The Speaker and Language Recognition Workshop

May 31 - June 3, 2004
Toledo, Spain

The Effectiveness of Higher Order Spectral Phase Features in Speaker Identification

Daryl Ning, Vinod Chandran

Speech, Audio, Image, and Video Technology Programme, School of Electrical and Electronic Systems Engineering, Queensland University of Technology, Brisbane, Australia

This paper studies the effectiveness of higher order spectra (HOS) phase features in the task of speaker identification. Within the speech processing community, short time spectral phase information is generally regarded as unimportant for speaker recognition. In fact, the most commonly used features for speaker recognition are the Mel frequency cepstral coeffi- cients (MFCC), which are defined from the magnitude spectrum only. In our experiments, we utilise features that contain both magnitude and phase spectral information. These HOS phase features are derived by integrating points along a straight line in bifrequency space. Clean microphone speech from a 20 male speaker database is used, and Gaussian mixture models (GMM) are constructed from the set of extracted features. The HOS phase features achieve a correct identification rate of 98.5%, which is similar to the rate achieved by the MFCC feature set (99.4%). The usefulness of short time phase spectral information is also verified by performing experiments after removing the magnitude spectral information from the speech data. The HOS phase features are also shown to be more robust to additive white Gaussian noise in mismatched training and testing conditions than MFCCs.

Full Paper

Bibliographic reference.  Ning, Daryl / Chandran, Vinod (2004): "The effectiveness of higher order spectral phase features in speaker identification", In ODYS-2004, 245-250.