EUROSPEECH 2003 - INTERSPEECH 2003
Many of the problems arising in speech processing are characterized by extremely large training and testing sets, constraining the kinds of models and algorithms that lead to tractable implementations. In particular, we would like the amount of processing associated with each test frame to be sublinear (i.e., logarithmic) in the number of training points. In this paper, we consider smoothed kernel regression models at each test frame, using only those training frames that are close to the desired test frame. The problem is made tractable via the use of approximate nearest neighbors techniques. The resulting system is conceptually simple, easy to implement, and fast, with performance comparable to more sophisticated methods. Preliminary results on a NIST speaker recognition task are presented, demonstrating the feasibility of the method.
Bibliographic reference. Rifkin, Ryan (2003): "Speaker recognition using local models", In EUROSPEECH-2003, 3009-3012.