We introduce a novel approach to decoding in speech recognition (termed attention-shift decoding) that attempts to mimic aspects of human speech recognition responsible for robustness in processing conversational speech. Our approach is a radical departure from traditional decoding algorithms for speech recognition. We propose a method to first identify reliable regions of the speech signal and then use these to help decode the unreliable regions, thus conditioning on potentially non-consecutive portions of the signal. We test this approach in a second-pass rescoring framework and compare it to standard second-pass rescoring. On a conversational telephone speech recognition task (EARS RT-03 CTS evaluation), our approach shows an improvement of 2.6% absolute when using oracle information for detecting the reliable regions, and 0.4% absolute when detecting the reliable regions automatically.
Bibliographic reference. Kumaran, Raghunandan / Bilmes, Jeff / Kirchhoff, Katrin (2007): "Attention shift decoding for conversational speech recognition", In INTERSPEECH-2007, 1493-1496.