INTERSPEECH 2004 - ICSLP
In our previous paper, we presented a speaker identification system using a multi-SNR multi-band method, and reported its robustness against environmental noises. This paper describes two modifications to the system for further enhancement of its noise-robustness. Firstly, 1/f noise is employed instead of white Gaussian noise to make noisy data for training multi-SNR GMMs. Secondly, recombination weights for subband likelihood are automatically adjusted based on the estimated subband noise power. For performance evaluation, text-independent speaker identification experiments were conducted on test speech data created by mixing clean speech data with 5 kinds of environmental noises: "bus", "car", "office", "lobby", and "restaurant" at 0 and 10 dB SNRs. By the two modifications, the identification error rate was reduced 30.3% on the average compared with the baseline multi-SNR multi-band method using white Gaussian noise and equal weights.
Bibliographic reference. Yoshida, Kenichi / Takagi, Kazuyuki / Ozeki, Kazuhiko (2004): "Improved model training and automatic weight adjustment for multi-SNR multi-band speaker identification system", In INTERSPEECH-2004, 1749-1752.