ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

Exploration of vocal excitation modulation features for speaker recognition

Ning Wang, P. C. Ching, Tan Lee

To derive spectro-temporal vocal source features complementary to the conventional spectral-based vocal tract features in improving the performance and reliability of a speaker recognition system, the excitation related modulation properties are studied. Through multi-band demodulation method, source-related amplitude and phase quantities are parameterized into feature vectors. Evaluation of the proposed features is carried out first through a set of designed experiments on artificially generated inputs, and then by simulations on speech database. It is observed via the designed experiments that the proposed features are capable of capturing the vocal differences in terms of F0 variation, pitch epoch shape, and relevant excitation details between epochs. In the real task simulations, by combination with the standard spectral features, both the amplitude and the phase-related features are shown to evidently reduce the identification error rate and equal error rate in the context of the Gaussian mixture model-based speaker recognition system.

doi: 10.21437/Interspeech.2009-269

Cite as: Wang, N., Ching, P.C., Lee, T. (2009) Exploration of vocal excitation modulation features for speaker recognition. Proc. Interspeech 2009, 892-895, doi: 10.21437/Interspeech.2009-269

  author={Ning Wang and P. C. Ching and Tan Lee},
  title={{Exploration of vocal excitation modulation features for speaker recognition}},
  booktitle={Proc. Interspeech 2009},