8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Multi-Sample Fusion with Constrained Feature Transformation for Robust Speaker Verification

Ming-Cheung Cheung, Kwok-Kwong Yiu, Man-Wai Mak, Sun-Yuan Kung

The Hong Kong Polytechnic University, Hong Kong

This paper proposes a single-source multiple-sample fusion approach to text-independent speaker verification. In conventional speaker verification systems, the scores obtained from claimant's utterances are averaged and the resulting mean score is used for decision making. Instead of using an equal weight for all scores, we propose assigning a different weight to each score, where the weights are made dependent on the difference between the score values and a speaker-dependent reference score obtained during enrollment. As the fusion weights depend on the verification scores, we applied a technique called constrained stochastic feature transformation to minimize the mismatch between enrollment and verification data in order to enhance the scores' reliability. Experimental results based on the 2001 NIST evaluation set show that the proposed fusion approach outperforms the equal-weight approach by 22% in terms of equal error rate and 16% in terms of minimum detection cost.

Full Paper

Bibliographic reference.  Cheung, Ming-Cheung / Yiu, Kwok-Kwong / Mak, Man-Wai / Kung, Sun-Yuan (2004): "Multi-sample fusion with constrained feature transformation for robust speaker verification", In INTERSPEECH-2004, 1813-1816.