One of the critical challenges facing next-generation human-computer interfaces concerns the development of effective language processing techniques for utterances distributed over multiple input modes such as speech, touch, and gesture. Finite-state models for parsing, understanding, and integration of multimodal input are efficient, enable tight coupling of multimodal language processing with speech recognition, and provide a general probabilistic framework for multimodal ambiguity resolution. We describe an experiment that demonstrates the effectiveness of tight coupling of multimodal language processing in improving speech recognition performance with clean speech and with different levels of background noise. Our approach yields an average 23% relative sentence error reduction on clean speech.
Cite as: Bangalore, S., Johnston, M. (2000) Integrating multimodal language processing with speech recognition. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 2, 126-129, doi: 10.21437/ICSLP.2000-225
@inproceedings{bangalore00_icslp, author={Srinivas Bangalore and Michael Johnston}, title={{Integrating multimodal language processing with speech recognition}}, year=2000, booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)}, pages={vol. 2, 126-129}, doi={10.21437/ICSLP.2000-225} }