The speech signal is a combination of attributes that contain information of the speaker, channel and noise. Conventional speaker verification systems train a single generic model for all cases, and handle all variations from these attributes either by factor analysis, or by not considering the variations explicitly. We propose a new methodology to partition the data space according to these factors and train separate models for each partition. The partitions may be obtained according to any attribute. We train models for the partitions discriminatively to maximize the separation between them. For classification we suggest multiple ways of combining scores from partitions. Experiments performed on the database NIST2008 show that our method improves the performance with respect to conventional methods when partitions are formed according to speakers. On noisy speech, partitions by noise result in the best performance.
Bibliographic reference. Perera, Leibny Paola Garcia / Raj, Bhiksha / Nolazco-Flores, Juan Arturo (2013): "Ensemble approach in speaker verification", In INTERSPEECH-2013, 2455-2459.