
Sixth International Conference on Spoken Language Processing
(ICSLP 2000)
Beijing, China
October 1620, 2000 

New Distance Measures for TextIndependent Speaker Identification
ZhongHua Wang, Cheng Wu, David Lubensky
IBM T. J. Watson research Center, Yorktown Heights, NY, 10598, USA
Distance measures [1][2][3] based on the covariance matrix
of feature vectors were applied to textindependent speaker
verification and identification. However, some of them do
not satisfy the symmetric property which is fundamental to
a distance measure. In this paper, we propose several symmetric
distance measures based on the covariance matrix
of feature vectors, and then construct some advanced measures
using the data fusion method [4]. These new distance
measures have good mathematic properties and impose little
overhead in calculation. We apply these distance measures
to textindependent speaker identification and handset
detection. A new robust technique is developed for crosshandset
speaker identification, and find that compensating
the second order statistics is important when dealing with
the mismatch caused by different handsets.
The experiment uses the cb1 and cb2 data in the LLHDB
corpus [5] for samehandset and crosshandset speaker identification
test. We find that the use of delta cepstra decreases
the speaker identification error rate by as much as 38%.
Data fusion technique could further decrease the error rate
by 11%. Applying these distance measures to 2handset detection
problem, the error rate is 12%. Using our new robust
technique, the crosshandset speaker identification error rate
is could be decreased by around 17%.
References
 H. Gish, "Robust discrimination in automatic speaker
identification", Proc. ICASSP 1991, Vol. 1, pp. 289
292.
 F. Bimbot and L. Mathan, "Secondorder statistical
measures for textindependent speaker identification",
ECSA workshop on automatic speaker recognition,
identification and verification, 1994, pp. 5154.
 S. Johnson, "Speaker tracking", Mphil thesis, University
of Cambridge, 1997.
 K. R. Farrell, "Discriminatory measures for speaker
recognition", Proceedings of Neural Networks for Signal
Processing, 1995, and references therein.
 D. A. Reynolds, "HTIMIT and LLHDB: Speech corpora
for the study of handset transducer effects",
ICASSP, pp. 15351538, May 1997, Munich, Germany.
Full Paper
Bibliographic reference.
Wang, ZhongHua / Wu, Cheng / Lubensky, David (2000):
"New distance measures for textindependent speaker identification",
In ICSLP2000, vol.2, 811814.