11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Approaching Human Listener Accuracy with Modern Speaker Verification

Ville Hautamäki (1), Tomi Kinnunen (2), Mohaddeseh Nosratighods (3), Kong Aik Lee (1), Bin Ma (1), Haizhou Li (1)

(1) A*STAR, Singapore
(2) University of Eastern Finland, Finland
(3) University of New South Wales, Australia

Being able to recognize people from their voice is a natural ability that we take for granted. Recent advances have shown significant improvement in automatic speaker recognition performance. Besides being able to process large amount of data in a fraction of time required by human, automatic systems are now able to deal with diverse channel effects. The goal of this paper is to examine how state-of-the-art automatic system performs in comparison with human listeners, and to investigate the strategy for human-assisted form of automatic speaker recognition, which is useful in forensic investigation. We set up an experimental protocol using data from the NIST SRE 2008 core set. A total of 36 listeners have participated in the listening experiments from three sites, namely Australia, Finland and Singapore. State-of-the-art automatic system achieved 20% error rate, whereas fusion of human listeners achieved 22%.

Full Paper

Bibliographic reference.  Hautamäki, Ville / Kinnunen, Tomi / Nosratighods, Mohaddeseh / Lee, Kong Aik / Ma, Bin / Li, Haizhou (2010): "Approaching human listener accuracy with modern speaker verification", In INTERSPEECH-2010, 1473-1476.