5^{th} International Conference on Spoken Language ProcessingSydney, Australia |
In speaker recognition systems based on VQ, normally each speaker is assigned a codebook, and the classification is done by means of the a distortion distance of the utterance computed by means of each codebook. In [1] we proposed a system which instead of having a codebook for each speaker, had only one codebook for all the speakers, and for each speaker one histogram. This histogram was the occupancy rate of each codeword for a given speaker. This means that the information of the histogram of a given speaker is the probability that the speaker utters the information related to the codeword. So we approximated the pdf of each speaker by the normalized histogram. In this paper we present an exhaustive study of different measures for comparing histograms: Kullbach-Leiber, log-difference of each probability, geometrical distance, and the Euclidean distance. We have done also an exhaustive study of the properties of the system for each distance in the presence of noise (white and colored), and for different parameterizations: LPC, MFCC, LPC-Cepstrum-OSA (One sided autocorrelation sequence), LCP-Cepstrum. (Cepstrum with/without liftering). As the combination of experiments was high, the conclusions were drawn after an analysis of variance (ANOVA), and T-tests. Thus the conclusions, with significance levels, can be drawn about the differences and interactions between kind of. distance, parameterization, kind of noise and level of noise.
Bibliographic reference. Monte, Enric / Arqué, Ramon / Miró, Xavier (1998): "A VQ based speaker recognition system based in histogram distances. text independent and for noisy environments", In ICSLP-1998, paper 1145.