ISCA Archive Eurospeech 2001
ISCA Archive Eurospeech 2001

A segmental mixture model for speaker recognition

Robert P. Stapert, John S. Mason

Standard Gaussian mixture modelling does not possess time sequence information (TSI) other than that which might be embedded in the acoustic features. Dynamic time warping relates directly to TSI, time-warping two sequences of features into alignment. Here, a hybrid system embedding DTW into a GMM is presented. Improved automatic speaker verification performance is demonstrated. Testing 1000 speakers in a fully text independent, world-model-adapted mode shows an equal error improvement over a standard GMM from 4.1% to 3.8%.


doi: 10.21437/Eurospeech.2001-414

Cite as: Stapert, R.P., Mason, J.S. (2001) A segmental mixture model for speaker recognition. Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001), 2509-2512, doi: 10.21437/Eurospeech.2001-414

@inproceedings{stapert01_eurospeech,
  author={Robert P. Stapert and John S. Mason},
  title={{A segmental mixture model for speaker recognition}},
  year=2001,
  booktitle={Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001)},
  pages={2509--2512},
  doi={10.21437/Eurospeech.2001-414}
}