This paper evaluates continuous density hidden Markov models (CDHMM), dynamic time warping (DTW) and distortion-based vector quantisation (VQ) for speaker recognition, across incremental amounts of training data. In comparing VQ and CDHMMs for text-independent (TI) speaker recognition, it is shown that VQ performs better than an equivalent CDHMM with one training version, but is outperformed by the CDHMM when trained with ten training versions. In text-dependent (TD) experiments, a comparison of DTW, VQ and CDHMMs shows that DTW outperforms VQ and CDHMMs for sparse amounts of training data, but with more data, the performance of each model is indistinguishable. Further analysis shows TD to be superior to TI architecture for speaker recognition, and TD digit performance illustrates zero, 1 and 9 to be good discriminators.
Bibliographic reference. Yu, Kin / Mason, John S. / Oglesby, John (1995): "Speaker recognition models", In EUROSPEECH-1995, 629-632.