ISCA Archive Odyssey 2004
ISCA Archive Odyssey 2004

Features for speaker and language identification

Leena Mary, K. Sri Rama Murty, S.R. Mahadeva Prasanna, Bayya Yegnanarayana

In this paper we examine several features derived from the speech signal for the purpose of identification of speaker or language from the speech signal. Most of the current systems for speaker and language identification use spectral features from short segments of speech. There are additional features which can be derived from the residual of the speech signal, which correspond to the excitation source of speech signal. These features at the subsegmental (less than a pitch period) level correspond to the glottal vibration in each cycle, and at the suprasegmental (several pitch periods) level the features correspond to intonation and duration characteristics of speech. At the subsegmental level features can be extracted from the residual signal and also from the phase of the residual signal. The characteristics of speaker or language can be captured from the spectral or subsegmental features using Autoassociative Neural Network (AANN) models. We demonstrate that these features indeed contain speaker-specific and language-specific information. Since these features are more or less from independent sources, it is likely that they provide complementary information, which when combined suitably will increase the effectiveness of speaker and language identification systems.

Cite as: Mary, L., Murty, K.S.R., Prasanna, S.R.M., Yegnanarayana, B. (2004) Features for speaker and language identification. Proc. The Speaker and Language Recognition Workshop (Odyssey 2004), 323-328

  author={Leena Mary and K. Sri Rama Murty and S.R. Mahadeva Prasanna and Bayya Yegnanarayana},
  title={{Features for speaker and language identification}},
  booktitle={Proc. The Speaker and Language Recognition Workshop (Odyssey 2004)},