8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Distributed Speaker Recognition using Earth Mover's Distance

Yoshiyuki Umeda, Shingo Kuroiwa, Satoru Tsuge, Fuji Ren

The University of Tokushima, Japan

In this paper, we focus on distributed speaker recognition a technique, in which quantized feature parameters are sent to a server, as with distributed speech recognition. The Gaussian mixture model is trained using the maximum likelihood approach. The GMM has output probability functions with continuous density functions. It is difficult to fit continuous density functions to quantized data. To overcome this problem, we propose a novel speaker recognition technique which does not need speaker model training. The proposed method directly calculates the distance between a set of quantized feature parameters of registered speech and a set of quantized feature parameters of test speech. To measure distance, we use Earth Mover's Distance (EMD). We conduct text-independent speaker identification experiments using the proposed method. When compared to results using the traditional GMM, the proposed method yielded relative error reductions of 80% for quantized data.

Full Paper

Bibliographic reference.  Umeda, Yoshiyuki / Kuroiwa, Shingo / Tsuge, Satoru / Ren, Fuji (2004): "Distributed speaker recognition using earth mover's distance", In INTERSPEECH-2004, 2389-2392.