Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

New Distance Measures for Text-Independent Speaker Identification

Zhong-Hua Wang, Cheng Wu, David Lubensky

IBM T. J. Watson research Center, Yorktown Heights, NY, 10598, USA

Distance measures [1][2][3] based on the covariance matrix of feature vectors were applied to text-independent speaker verification and identification. However, some of them do not satisfy the symmetric property which is fundamental to a distance measure. In this paper, we propose several symmetric distance measures based on the covariance matrix of feature vectors, and then construct some advanced measures using the data fusion method [4]. These new distance measures have good mathematic properties and impose little overhead in calculation. We apply these distance measures to text-independent speaker identification and handset detection. A new robust technique is developed for crosshandset speaker identification, and find that compensating the second order statistics is important when dealing with the mismatch caused by different handsets. The experiment uses the cb1 and cb2 data in the LLHDB corpus [5] for same-handset and cross-handset speaker identification test. We find that the use of delta cepstra decreases the speaker identification error rate by as much as 38%. Data fusion technique could further decrease the error rate by 11%. Applying these distance measures to 2-handset detection problem, the error rate is 12%. Using our new robust technique, the cross-handset speaker identification error rate is could be decreased by around 17%.

References

  1. H. Gish, "Robust discrimination in automatic speaker identification", Proc. ICASSP 1991, Vol. 1, pp. 289- 292.
  2. F. Bimbot and L. Mathan, "Second-order statistical measures for text-independent speaker identification", ECSA workshop on automatic speaker recognition, identification and verification, 1994, pp. 51-54.
  3. S. Johnson, "Speaker tracking", Mphil thesis, University of Cambridge, 1997.
  4. K. R. Farrell, "Discriminatory measures for speaker recognition", Proceedings of Neural Networks for Signal Processing, 1995, and references therein.
  5. D. A. Reynolds, "HTIMIT and LLHDB: Speech corpora for the study of handset transducer effects", ICASSP, pp. 1535-1538, May 1997, Munich, Germany.


Full Paper

Bibliographic reference.  Wang, Zhong-Hua / Wu, Cheng / Lubensky, David (2000): "New distance measures for text-independent speaker identification", In ICSLP-2000, vol.2, 811-814.