Sixth European Conference on Speech Communication and Technology

Budapest, Hungary
September 5-9, 1999

Speaker Recognition Based on Discriminative Feature Extraction - Optimization of Mel-Cepstral Features Using Second-Order All-Pass Warping Function

Chiyomi Miyajima (2,1), Hideyuki Watanabe (1), Tadashi Kitamura (2), Shigeru Katagiri (3,1)

(1) ATR - Human Information Processing Research Laboratories, Hikaridai, Kyoto, Japan
(2) Department of Computer Science, Nagoya Institute of Technology, Nagoya, Japan
(3) NTT Communication Science Laboratories, Hikaridai, Kyoto, Japan

This paper describes a new framework for designing speaker recognition systems based on the discriminative feature extraction (DFE) method. We apply a mel-cepstral estimation technique to the feature extractor in a Gaussian mixture model (GMM)­based text­independent speaker identification system. The mel­cepstral estimation technique uses the second­order all­pass warping function for frequency transformation. We jointly optimize the frequency warping parameters of the feature extractor and the GMM parameters of the classifier based on a minimum classification error (MCE) criterion. Experimental results show that the frequency warped scale after optimization is different from traditional linear/mel scales; moreover, the proposed system outperforms conventional systems trained with the generalized probabilistic descent (GPD) method in which only the classifier is optimized.

Full Paper (PDF)   Gnu-Zipped Postscript

Bibliographic reference.  Miyajima, Chiyomi / Watanabe, Hideyuki / Kitamura, Tadashi / Katagiri, Shigeru (1999): "Speaker recognition based on discriminative feature extraction - optimization of mel-cepstral features using second-order all-pass warping function", In EUROSPEECH'99, 779-782.