EUROSPEECH 2003 - INTERSPEECH 2003
This article describes a new approach for cues discrimination between speakers addressed to a speaker identification task. To this end, we make use of elements of decision theory. We propose to decompose the conventional feature space (MFCCs) into two subspaces which carry information about discriminative and confusable sections of the speech signal. The method is based on the idea that, instead of adapting the speakers models to a new test environment, we require the test utterance to fit the speakers models environment. Discriminative sections of training speech are used to estimate the probability density function (pdf) of a discriminative world model (DM), and confusable sections to estimate the probability density function of a confusion world model (CM). The two models are then used as a maximum likelihood detector (filter) at the input of the recogniser. The method was experimented on highly mismatched telephone speech and achieves a considerable improvement (averaging 16% gain in performance) over the baseline GMM system.
Bibliographic reference. Mihoubi, M. / Boulianne, Gilles / Dumouchel, Pierre (2003): "Discriminative training and maximum likelihood detector for speaker identification", In EUROSPEECH-2003, 2657-2660.