EUROSPEECH 2001 Scandinavia
Standard Gaussian mixture modelling does not possess time sequence information (TSI) other than that which might be embedded in the acoustic features. Dynamic time warping relates directly to TSI, time-warping two sequences of features into alignment. Here, a hybrid system embedding DTW into a GMM is presented. Improved automatic speaker verification performance is demonstrated. Testing 1000 speakers in a fully text independent, world-model-adapted mode shows an equal error improvement over a standard GMM from 4.1% to 3.8%.
Bibliographic reference. Stapert, Robert P. / Mason, John S. (2001): "A segmental mixture model for speaker recognition", In EUROSPEECH-2001, 2509-2512.