5th International Conference on Spoken Language Processing

Sydney, Australia
November 30 - December 4, 1998

Speech Pre-Processing Against Intentional Imposture In Speaker Recognition

Dominique Genoud (1), Gérard Chollet (2)

(1) IDIAP, CH-1920 CP 592 Martigny, Switzerland
(2) CNRS URA-820 ENST, Rue Barrault 46 75634 Paris, France

Recently, some large-scale text dependent speaker verification systems have been tested. They show that less than 1% Equal Error Rate can be obtained on a test set score distribution. So far, the majority of impostor tests are performed using speakers who don't really try to fool the system. This can be explained by the lack of databases recorded for this purpose, and the difficulty for a normal speaker to transform his voice characteristics. Nevertheless, actual automatic analysis/synthesis techniques, such as Harmonic plus Noise Model (H+N), allows very good speech/speaker transformations. Thus, it becomes possible to transform the voice of a speaker in the voice of another speaker in order to make voluntary impostures. This paper evaluates these kind of intrusive impostures and proposes a new speech pre-processing method, based on harmonic subtraction, making speaker verification less insensitive to these spectral transformations. A state-of-the-art Hidden Markov Model is used as reference system to assess the transformation results. The speech is parameterised by LPCC coefficients. The results are obtained on a database of telephone speech quality. The speaker verification system works in text dependent mode.

Full Paper
Sound Examples
- Source Sound
#2 - Target Sound
#3 - Source noise HMM transformation
#4 - Random background noise HMM transformation
#5 - Original Sound
#6 - Harmonics subtraction Sound

Bibliographic reference.  Genoud, Dominique / Chollet, Gérard (1998): "Speech pre-processing against intentional imposture in speaker recognition", In ICSLP-1998, paper 0734.