ODYSSEY 2004 - The Speaker and Language Recognition Workshop

May 31 - June 3, 2004
Toledo, Spain

Linear and Non-Linear Fusion of ALISP-based and GMM systems for Text-independent Speaker Verification

Asmaa El Hannani (1), Dijana Petrovska-Delacrétaz (1), Gérard Chollet (2)

(1) DIVA Group, Informatics Department, University of Fribourg, Switzerland
(2) TSI Department, CNRS-LTCI ENST, Paris, France

Current state-of-the-art speaker verification algorithms use Gaussian Mixture Models (GMM) to estimate the probability density function of the acoustic feature vectors. They are denoted here as global systems. In order to give better performance, they have to be combined with other classifiers, using different fusion methods. The performance of the final classi- fier depend on the choice of the single classifiers and also on the fusion technique used to combine them. In our previous studies we have used the data-driven Automatic Language Independent Speech Processing (ALISP) segmentation method to segment the speech data, as a first step of the speaker verification task. Dynamic Time Warping (DTW) distortion measure was used as a distortion measure between two speech segments and Logistic Regression Function to determine the optimal weights of the speech segments (including "silences"). This system is denoted as ALISP-DTW system. In this paper the focus is put on the fusion techniques used to combine ALISP-DTW and GMM systems. We show that when using a non-linear fusion method (Multi-Layer Perceptron), we improve slightly the final fusion result as compared to the linear fusion strategies.

Full Paper

Bibliographic reference.  Hannani, Asmaa El / Petrovska-Delacrétaz, Dijana / Chollet, Gérard (2004): "Linear and non-linear fusion of ALISP-based and GMM systems for text-independent speaker verification", In ODYS-2004, 111-116.