Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Integrating Multimodal Language Processing with Speech Recognition

Srinivas Bangalore, Michael Johnston

AT&T Labs Research, Shannon Laboratory, Florham Park, NJ, USA

One of the critical challenges facing next-generation human-computer interfaces concerns the development of effective language processing techniques for utterances distributed over multiple input modes such as speech, touch, and gesture. Finite-state models for parsing, understanding, and integration of multimodal input are efficient, enable tight coupling of multimodal language processing with speech recognition, and provide a general probabilistic framework for multimodal ambiguity resolution. We describe an experiment that demonstrates the effectiveness of tight coupling of multimodal language processing in improving speech recognition performance with clean speech and with different levels of background noise. Our approach yields an average 23% relative sentence error reduction on clean speech.


Full Paper

Bibliographic reference.  Bangalore, Srinivas / Johnston, Michael (2000): "Integrating multimodal language processing with speech recognition", In ICSLP-2000, vol.2, 126-129.