Sixth International Conference on Spoken Language Processing
October 16-20, 2000
New Distance Measures for Text-Independent Speaker Identification
Zhong-Hua Wang, Cheng Wu, David Lubensky
IBM T. J. Watson research Center, Yorktown Heights, NY, 10598, USA
Distance measures  based on the covariance matrix
of feature vectors were applied to text-independent speaker
verification and identification. However, some of them do
not satisfy the symmetric property which is fundamental to
a distance measure. In this paper, we propose several symmetric
distance measures based on the covariance matrix
of feature vectors, and then construct some advanced measures
using the data fusion method . These new distance
measures have good mathematic properties and impose little
overhead in calculation. We apply these distance measures
to text-independent speaker identification and handset
detection. A new robust technique is developed for crosshandset
speaker identification, and find that compensating
the second order statistics is important when dealing with
the mismatch caused by different handsets.
The experiment uses the cb1 and cb2 data in the LLHDB
corpus  for same-handset and cross-handset speaker identification
test. We find that the use of delta cepstra decreases
the speaker identification error rate by as much as 38%.
Data fusion technique could further decrease the error rate
by 11%. Applying these distance measures to 2-handset detection
problem, the error rate is 12%. Using our new robust
technique, the cross-handset speaker identification error rate
is could be decreased by around 17%.
- H. Gish, "Robust discrimination in automatic speaker
identification", Proc. ICASSP 1991, Vol. 1, pp. 289-
- F. Bimbot and L. Mathan, "Second-order statistical
measures for text-independent speaker identification",
ECSA workshop on automatic speaker recognition,
identification and verification, 1994, pp. 51-54.
- S. Johnson, "Speaker tracking", Mphil thesis, University
of Cambridge, 1997.
- K. R. Farrell, "Discriminatory measures for speaker
recognition", Proceedings of Neural Networks for Signal
Processing, 1995, and references therein.
- D. A. Reynolds, "HTIMIT and LLHDB: Speech corpora
for the study of handset transducer effects",
ICASSP, pp. 1535-1538, May 1997, Munich, Germany.
Wang, Zhong-Hua / Wu, Cheng / Lubensky, David (2000):
"New distance measures for text-independent speaker identification",
In ICSLP-2000, vol.2, 811-814.