This paper investigates improvements to the vector quantisation (VQ) distortion method of text-independent speaker identification, using a conventional codebook of instantaneous cepstral vectors from each speaker's training data, and one second-level codebook of transitional cepstral vectors for each codeword of the instantaneous codebook. Results on a 20-speaker database of 30 phonetically rich utterances show a reduction of the error rate from 6.5% for a conventional codebook of size 128 to 5.5% for a code-book which contains 16 transitional codewords for each of the 128 instantaneous codewords (128x16). Results on a 20-speaker database of spoken digits show a reduction of error rate from 3.1% for a conventional (128xO)-codebook to 0.9% for a (128x4)-codebook. Alternatively, a constant error rate can be maintained at a reduced number of codeword comparisons using codeword-specific transitional code-books. Results also show that, given a sufficient size of transitional codebook, transitional distortion scores after instantaneous preclassification can be superior to purely instantaneous distortion scores.
Bibliographic reference. Wagner, Michael / Mason, John S. / Millar, J. Bruce (1995): "Speaker identification using vector quantisation with codeword-specific derivative coding", In EUROSPEECH-1995, 383-386.