15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Improving the Performance of Far-Field Speaker Verification Using Multi-Condition Training: The Case of GMM-UBM and i-Vector Systems

Anderson R. Avila (1), Milton Sarria-Paja (1), Francisco J. Fraga (2), Douglas O'Shaughnessy (1), Tiago H. Falk (1)

(1) INRS-EMT, Canada
(2) UFABC, Brazil

While considerable work has been done to characterize the detrimental effects of channel variability on automatic speaker verification (ASV) performance, little attention has been paid to the effects of room reverberation. This paper investigates the effects of room acoustics on the performance of two far-field ASV systems: GMM-UBM (Gaussian mixture model - universal background model) and i-vector. We show that ASV performance is severely affected by reverberation, particularly for i-vector based systems. Three multi-condition training methods are then investigated to mitigate such detrimental effects. The first uses matched train/test speaker models based on estimated reverberation time (RT) values. The second utilizes two-condition training where clean and reverberant models are used. Lastly, a four-condition training setup is proposed where models for clean, mild, moderate, and severe reverberation levels are used. Experimental results show the first and third multi-condition training methods providing significant gains in performance relative to the baseline, with the latter being more suitable for practical resource-constrained far-field applications.

Full Paper

Bibliographic reference.  Avila, Anderson R. / Sarria-Paja, Milton / Fraga, Francisco J. / O'Shaughnessy, Douglas / Falk, Tiago H. (2014): "Improving the performance of far-field speaker verification using multi-condition training: the case of GMM-UBM and i-vector systems", In INTERSPEECH-2014, 1096-1100.