INTERSPEECH 2004 - ICSLP
Most speaker recognition systems utilize only low-level short-term spectral features and ignore high-level long-term information, such as prosody and speaking style. This paper presents a novel eigen-prosody analysis (EPA) approach to capture long-term prosodic information of a speaker for robust speaker recognition under mismatch environment. It converts the prosodic feature contours of a speaker's speech into sequences of prosody symbols, and then transforms the speaker recognition problem into a full text document retrieval-similar task. Experimental results on the well-known HTIMIT database have shown that, even only few training/test data is available, a remarkable improvement, about 28.7% relative error rate reduction comparing with the GMM/cepstral mean subtraction (CMS) baseline, could be achieved.
Bibliographic reference. Chen, Zi-He / Liao, Yuan-Fu / Juang, Yau-Tarng (2004): "Eigen-prosody analysis for robust speaker recognition under mismatch handset environment", In INTERSPEECH-2004, 1421.