8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


Automated Speaker Recognition in Real World Conditions: Controlling the Uncontrollable

Hirotaka Nakasone

Federal Bureau of Investigation, USA

The current development of automatic speaker recognition technology may provide a new method to augment or replace the traditional method offered by qualified experts using aural and spectrographic analysis. The most promising of these automated technologies are based on statistical hypothesis testing methods involving likelihood ratios. The null hypothesis is generated using a universal background model composed of a large population of speakers. However, techniques with excellent performance in standardized evaluations (NIST trials) may not work perfectly in the real world. By defining and controlling the input speech samples carefully, we show quantitative differences in performance for different factors affecting a speaker population, and discuss on-going efforts to improve the accuracy rate for use in real world conditions. In this paper we will address two issues related to the factors that affect the system performance, namely the speech signal duration and the signal-to-noise ratio.

Full Paper

Bibliographic reference.  Nakasone, Hirotaka (2003): "Automated speaker recognition in real world conditions: controlling the uncontrollable", In EUROSPEECH-2003, 697-700.