We present a system that uses nearest neighbour classification on the state level of the hidden Markov model. Common speech recognition systems nowadays use Gaussian mixtures with a very high number of densities. We propose to carry this idea to the extreme, such that each observation is a prototype of its own. This approach is well-known and widely used in other areas of pattern recognition and has some immediate advantages over other classification approaches, but has never been applied to speech recognition. We evaluate the proposed method on the SieTill corpus of continuous digit strings and on the large vocabulary EPPS English task. It is shown that nearest neighbour outperforms conventional systems when training data is sparse.
Bibliographic reference. Deselaers, Thomas / Heigold, Georg / Ney, Hermann (2007): "Speech recognition with state-based nearest neighbour classifiers", In INTERSPEECH-2007, 2093-2096.