Interspeech'2005 - Eurospeech
In this paper, we propose a novel discriminative speech frame selection (DSFS) scheme for the problem of in-set/out-of-set speaker identification, which seeks to decrease the similarity between speaker models and background model (or anti-speaker model), and increase the accuracy of speaker identification. The working scheme of DSFS consists of two steps: speech frame analysis and discriminative frame selection. Two methods are used to perform DSFS, (i) Teager Energy Operator (TEO) energy based and (ii) MELP pitch based methods. An evaluation using both clean and noisy corpora that include single and multiple recording sessions show that both TEO energy based and MELP pitch based DSFS schemes can reduce EER (equal error rate) dramatically over a traditional GMM-UBM baseline system. Compared with traditional GMM speaker identification, the DSFS is able to select only discriminative speech frames, and therefore consider only discriminative features. This selection is able to decrease the overlap between speaker models and background model, and improve the performance of in-set/out-of-set speaker identification.
Bibliographic reference. Zhang, Xianxian / Hansen, John H. L. (2005): "In-set/out-of-set speaker identification based on discriminative speech frame selection", In INTERSPEECH-2005, 2037-2040.