5th International Conference on Spoken Language Processing
Recently, some large-scale text dependent speaker verification systems have been tested. They show that less than 1% Equal Error Rate can be obtained on a test set score distribution. So far, the majority of impostor tests are performed using speakers who don't really try to fool the system. This can be explained by the lack of databases recorded for this purpose, and the difficulty for a normal speaker to transform his voice characteristics. Nevertheless, actual automatic analysis/synthesis techniques, such as Harmonic plus Noise Model (H+N), allows very good speech/speaker transformations. Thus, it becomes possible to transform the voice of a speaker in the voice of another speaker in order to make voluntary impostures. This paper evaluates these kind of intrusive impostures and proposes a new speech pre-processing method, based on harmonic subtraction, making speaker verification less insensitive to these spectral transformations. A state-of-the-art Hidden Markov Model is used as reference system to assess the transformation results. The speech is parameterised by LPCC coefficients. The results are obtained on a database of telephone speech quality. The speaker verification system works in text dependent mode.
#1 - Source Sound
#2 - Target Sound
#3 - Source noise HMM transformation
#4 - Random background noise HMM transformation
#5 - Original Sound
#6 - Harmonics subtraction Sound
Bibliographic reference. Genoud, Dominique / Chollet, Gérard (1998): "Speech pre-processing against intentional imposture in speaker recognition", In ICSLP-1998, paper 0734.