We integrate a supervised machine learning mechanism for detecting erroneous words in the output of a speech recognizer with a two-tier error-correction approach that features (1) a noisy-channel model that replaces erroneous words with generic words, and (2) a phonetic-similarity mechanism that refines the generic words based on a short list of candidate interpretations. Our results, obtained on a corpus of 341 referring expressions, show that the first tier improves interpretation performance, and the second tier yields further improvements.
Cite as: Zukerman, I., Partovi, A., Kim, S.N. (2015) Context-dependent error correction of spoken referring expressions. Proc. Interspeech 2015, 2032-2036, doi: 10.21437/Interspeech.2015-461
@inproceedings{zukerman15_interspeech, author={Ingrid Zukerman and Andisheh Partovi and Su Nam Kim}, title={{Context-dependent error correction of spoken referring expressions}}, year=2015, booktitle={Proc. Interspeech 2015}, pages={2032--2036}, doi={10.21437/Interspeech.2015-461} }