ODYSSEY 2004 - The Speaker and Language Recognition Workshop
May 31 - June 3, 2004
In this paper we examine several features derived from the speech signal for the purpose of identification of speaker or language from the speech signal. Most of the current systems for speaker and language identification use spectral features from short segments of speech. There are additional features which can be derived from the residual of the speech signal, which correspond to the excitation source of speech signal. These features at the subsegmental (less than a pitch period) level correspond to the glottal vibration in each cycle, and at the suprasegmental (several pitch periods) level the features correspond to intonation and duration characteristics of speech. At the subsegmental level features can be extracted from the residual signal and also from the phase of the residual signal. The characteristics of speaker or language can be captured from the spectral or subsegmental features using Autoassociative Neural Network (AANN) models. We demonstrate that these features indeed contain speaker-specific and language-specific information. Since these features are more or less from independent sources, it is likely that they provide complementary information, which when combined suitably will increase the effectiveness of speaker and language identification systems.
Bibliographic reference. Mary, Leena / Murty, K. Sri Rama / Prasanna, S.R. Mahadeva / Yegnanarayana, Bishnu (2004): "Features for speaker and language identification", In ODYS-2004, 323-328.