September 22-25, 1997
This paper compares two approaches to background model representation for a text-independent speaker verification task using Gaussian mixture models. We compare speaker-dependent background speaker sets to the use of a universal, speaker-independent background model (UBM). For the UBM, we describe how Bayesian adaptation can be used to derive claimant speaker models, providing a structure leading to significant computational savings during recognition. Experiments are conducted on the 1996 NIST Speaker Recognition Evaluation corpus and it is clearly shown that a system using a UBM and Bayesian adaptation of claimant models produces superior performance compared to speaker-dependent background sets or the UBM with independent claimant models. In addition, the creation and use of a telephone handset-type detector and a procedure called hnorm is also described which shows further, large improvements in verification performance, especially under the difficult mismatched handset conditions. This is believed to be the first use of applying a hand-set- type detector and explicit handset- type normalization for the speaker verification task.
Bibliographic reference. Reynolds, Douglas A. (1997): "Comparison of background normalization methods for text-independent speaker verification", In EUROSPEECH-1997, 963-966.