ISCA Archive Interspeech 2007
ISCA Archive Interspeech 2007

Attention shift decoding for conversational speech recognition

Raghunandan Kumaran, Jeff Bilmes, Katrin Kirchhoff

We introduce a novel approach to decoding in speech recognition (termed attention-shift decoding) that attempts to mimic aspects of human speech recognition responsible for robustness in processing conversational speech. Our approach is a radical departure from traditional decoding algorithms for speech recognition. We propose a method to first identify reliable regions of the speech signal and then use these to help decode the unreliable regions, thus conditioning on potentially non-consecutive portions of the signal. We test this approach in a second-pass rescoring framework and compare it to standard second-pass rescoring. On a conversational telephone speech recognition task (EARS RT-03 CTS evaluation), our approach shows an improvement of 2.6% absolute when using oracle information for detecting the reliable regions, and 0.4% absolute when detecting the reliable regions automatically.


doi: 10.21437/Interspeech.2007-432

Cite as: Kumaran, R., Bilmes, J., Kirchhoff, K. (2007) Attention shift decoding for conversational speech recognition. Proc. Interspeech 2007, 1493-1496, doi: 10.21437/Interspeech.2007-432

@inproceedings{kumaran07_interspeech,
  author={Raghunandan Kumaran and Jeff Bilmes and Katrin Kirchhoff},
  title={{Attention shift decoding for conversational speech recognition}},
  year=2007,
  booktitle={Proc. Interspeech 2007},
  pages={1493--1496},
  doi={10.21437/Interspeech.2007-432}
}