8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

PocketSUMMIT: Small-Footprint Continuous Speech Recognition

I. Lee Hetherington


We present PocketSUMMIT, a small-footprint version of our SUMMIT continuous speech recognition system. With portable devices becoming smaller and more powerful, speech is increasingly becoming an important input modality on these devices. PocketSUMMIT is implemented as a variable-rate continuous density hidden Markov model with diphone context-dependent models. We explore various Gaussian parameter quantization schemes and find 8:1 compression or more is achievable with little reduction in accuracy. We also show how the quantized parameters can be used for rapid table lookup. We explore first-pass language model pruning in a finite-state transducer (FST) framework, as well as FST and n-gram weight quantization and bit packing, to further reduce memory usage. PocketSUMMIT is currently able to run a moderate vocabulary conversational speech recognition system in real time in a few MB on current PDAs and smart phones.

Full Paper

Bibliographic reference.  Hetherington, I. Lee (2007): "PocketSUMMIT: small-footprint continuous speech recognition", In INTERSPEECH-2007, 1465-1468.