INTERSPEECH 2012
13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Spelling as a Complementary Strategy for Speech Recognition

Keith Vertanen (1), Per Ola Kristensson (2)

(1) Department of Computer Science, Montana Tech of the University of Montana, Butte, Montana, USA
(2) School of Computer Science, University of St Andrews, St Andrews, Fife, UK

We compare a variety of strategies for incorporating spelling to create more robust voice-only speech interfaces. These strategies use different combinations of speaking the word, spelling the word, and spelling the word using a phonetic alphabet. For correcting a single recognition error, spelling the word or speaking and spelling the word reduced error rates substantially. Phonetic-spelling was very accurate with error rates on a 5K task approaching zero. Most importantly, multiple input strategies could be used simultaneously with only a modest degradation in performance compared to allowing only a single input strategy. Thus our work shows that spelling-based input strategies offer the potential of a simple, natural and effective way for users to both avoid and correct recognition errors.

Index Terms: speech recognition, error correction

Full Paper

Bibliographic reference.  Vertanen, Keith / Kristensson, Per Ola (2012): "Spelling as a complementary strategy for speech recognition", In INTERSPEECH-2012, 2294-2297.