8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Eigen-Prosody Analysis for Robust Speaker Recognition under Mismatch Handset Environment

Zi-He Chen (1), Yuan-Fu Liao (2), Yau-Tarng Juang (1)

(1) National Central University, Taiwan
(2) National Taipei University of Technology, Taiwan

Most speaker recognition systems utilize only low-level short-term spectral features and ignore high-level long-term information, such as prosody and speaking style. This paper presents a novel eigen-prosody analysis (EPA) approach to capture long-term prosodic information of a speaker for robust speaker recognition under mismatch environment. It converts the prosodic feature contours of a speaker's speech into sequences of prosody symbols, and then transforms the speaker recognition problem into a full text document retrieval-similar task. Experimental results on the well-known HTIMIT database have shown that, even only few training/test data is available, a remarkable improvement, about 28.7% relative error rate reduction comparing with the GMM/cepstral mean subtraction (CMS) baseline, could be achieved.

Full Paper

Bibliographic reference.  Chen, Zi-He / Liao, Yuan-Fu / Juang, Yau-Tarng (2004): "Eigen-prosody analysis for robust speaker recognition under mismatch handset environment", In INTERSPEECH-2004, 1421.