INTERSPEECH 2012
13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Unsupervised NAP Training Data Design for Speaker Recognition

Hanwu Sun, Bin Ma

Institute for Infocomm Research (I2R), A*STAR, Singapore

The Nuisance Attribute Projection (NAP) with labeled data provides an effective approach for improving the speaker recognition performance in the state-of-art speaker recognition system by removing unwanted speaker channel and handsets variation. However, the requirement for the labeled NAP training data may limit its practical application. In this paper, we propose an unsupervised clustering strategy to design NAP training data without labeled information about channel and speaker utterances. A fast clustering and purifying algorithm is introduced to group the unlabeled NAP training data into speaker dependent clusters to drive the NAP training data. The GMM-SVM based speaker recognition system is adopted to evaluate the performance. The system with the unsupervised NAP training data design achieves a similar performance with that using labeled NAP training data on both SRE06 1conv-1conv all English trials and SRE08 short2-short3 Tel-Tel All English trials subtasks.

Index Terms: speaker recognition, speaker diarization, speaker cluster, Nuisance Attribute Projection

Full Paper

Bibliographic reference.  Sun, Hanwu / Ma, Bin (2012): "Unsupervised NAP training data design for speaker recognition", In INTERSPEECH-2012, 1099-1102.