2001: A Speaker Odyssey - The Speaker Recognition Workshop
June 18-22, 2001
The Federal Bureau of Investigation (FBI) has been involved in forensic voice comparison for over four decades, and has a very strong interest in automatic speaker recognition (ASR) technology. Up until now, the FBI has relied on trained voice analysts using standardized aural and spectrographic techniques to assess the match between two voice samples. These methods are labor intensive, and to some degree, highly subjective. Within the past five years, published reports, the NIST evaluations, and speaker recognition workshops held around the world have indicated that ASR systems are achieving a high level of performance under certain conditions. In order to gauge the maturity of this technology specifically as applied to forensic cases, the FBI initiated an evaluation using their own database in late 1998. The goal of this study was to evaluate, identify, and procure the best forensic system for immediate procurement and in-house evaluation. The study results were completed in 1999. Several signal processing and classification technologies were found to be critical for use with forensic voice data. Although ASR technology is not a perfect system, it is approaching a performance level acceptable for application in real world forensic environments.
Using information gained from the evaluation studies above, the following key issues will be addressed. (1) The formulation of a set of minimum core technologies to meet the forensic requirements. (2) How to use information obtained from error analysis conducted on the ASR evaluation scores to impact further improvement. The errors in this case refer to either missed detections or false alarms. (3) The need for research to study the effects of signal quality and quantity on ASR recognition performance. (4) Inclusion of input speech quality measures as part of ASR decision process to improve the overall recognition performance. (5) How to draw a boundary (or perimeter) within which the ASR system can be applied meaningfully. (6) Legal requirements in the U.S. will be addressed, because of the potential impact of this ASR technology on the forensic community.
[Paper not available] Presentation
Bibliographic reference. Nakasone, Hirotaka (2001): "Speaker recognition in forensic environment", In ODYSSEY-2001 [presentation only].