ODYSSEY 2004 - The Speaker and Language Recognition Workshop
May 31 - June 3, 2004
This paper deals with automatic speaker recognition in forensic applications and handling mismatched technical conditions in a Bayesian framework for evaluating the strength of evidence. Mismatch in recording conditions has to be considered in the estimation of the strength of evidence, i.e., how likely it is that a questioned recording (trace) has been produced by a suspected speaker rather than by any other person from a relevant population. In forensic speaker recognition, in order to estimate such a likelihood ratio, a Bayesian interpretation framework and a corpus based methodology is employed.
Although automatic speaker recognition has shown high performance under controlled conditions, the conditions in which recordings are made by the police (anonymous calls and wiretapping) cannot be controlled and are far from ideal. Differences in the phone handset, in the transmission channel and in the recording tools introduce a variability, over and above the variability of human speech. In this paper we focus on how to estimate and deal with differences in recording conditions of the databases used: detection of whether there is good discrimination between speakers within a database, detection of significant mismatch in recording conditions and statistical compensation in case of mismatch.
Bibliographic reference. Alexander, Anil / Botti, Filippo / Drygajlo, Andrzej (2004): "Handling mismatch in corpus-based forensic speaker recognition", In ODYS-2004, 69-74.