Sixth International Conference on Spoken Language Processing
October 16-20, 2000
Integrating Multimodal Language Processing with Speech Recognition
Srinivas Bangalore, Michael Johnston
AT&T Labs Research, Shannon Laboratory,
Florham Park, NJ, USA
One of the critical challenges facing next-generation
human-computer interfaces concerns the development of effective
language processing techniques for utterances distributed over
multiple input modes such as speech, touch, and gesture.
Finite-state models for parsing, understanding, and integration
of multimodal input are efficient, enable tight coupling of
multimodal language processing with speech recognition, and
provide a general probabilistic framework for multimodal
ambiguity resolution. We describe an experiment that
demonstrates the effectiveness of tight coupling of multimodal
language processing in improving speech recognition
performance with clean speech and with different levels of
background noise. Our approach yields an average 23% relative
sentence error reduction on clean speech.
Bangalore, Srinivas / Johnston, Michael (2000):
"Integrating multimodal language processing with speech recognition",
In ICSLP-2000, vol.2, 126-129.