For speaker recognition studies, it is necessary to process the speech signal suitably to capture the speaker-specific information. There is complementary speaker-specific information in the excitation source and vocal tract system characteristics. Therefore it is necessary to separate these components, even approximately, from the speech signal. We propose linear prediction (LP) residual and LP coefficients to represent these two components. Analysis is performed in a pitch synchronous manner in order to focus on the significant portion of the speech signal in each glottal cycle, and also to reduce the artifacts of digital signal processing on the extracted features. Finally, the speaker-specific information is captured from the excitation and the vocal tract system components using autoassociative neural networks (AANN) models. We show that the pitch synchronous extraction of information from the residual and vocal tract system bring out the speaker-specific information much better than using the pitch asynchronous analysis as in the traditional block processing using an analysis window of fixed size.
Bibliographic reference. Reddy Mallidi, Sri Harish / Prahallad, Kishore / Gangashetty, Suryakanth V. / Yegnanarayana, Bayya (2010): "Significance of pitch synchronous analysis for speaker recognition using AANN models", In INTERSPEECH-2010, 669-672.