8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


Speaker Recognition Using Local Models

Ryan Rifkin

Honda Research Institute, USA

Many of the problems arising in speech processing are characterized by extremely large training and testing sets, constraining the kinds of models and algorithms that lead to tractable implementations. In particular, we would like the amount of processing associated with each test frame to be sublinear (i.e., logarithmic) in the number of training points. In this paper, we consider smoothed kernel regression models at each test frame, using only those training frames that are close to the desired test frame. The problem is made tractable via the use of approximate nearest neighbors techniques. The resulting system is conceptually simple, easy to implement, and fast, with performance comparable to more sophisticated methods. Preliminary results on a NIST speaker recognition task are presented, demonstrating the feasibility of the method.

Full Paper

Bibliographic reference.  Rifkin, Ryan (2003): "Speaker recognition using local models", In EUROSPEECH-2003, 3009-3012.