The prevalent state of the art in spoken language understanding by spoken dialog systems is both modular and whole-utterance. It is modular in that incoming utterances are processed by independent components that handle different aspects, such as acoustics, syntax, semantics, and intention / goal recognition. It is whole-utterance in that each component completes its work for an entire utterance prior to handing off the utterance to the next component. However, a growing body of evidence suggests that humans do not process language that way. Rather, people process speech by rapidly integrating constraints from multiple sources of knowledge and multiple linguistic levels incrementally, as the utterance unfolds. In this paper we describe ongoing work aimed at developing an architecture that will allow machines to understand spoken language in a similar way. This revolutionary approach is promising for two reasons: 1) it more accurately reflects contemporary models of human language understanding, and 2) it results in empirical improvements including increased parsing performance.
Cite as: Aist, G., Allen, J., Campana, E., Galescu, L., Gallo, C.A.G., Stoness, S.C., Swift, M., Tanenhaus, M. (2006) Software architectures for incremental understanding of human speech. Proc. Interspeech 2006, paper 1869-Wed2FoP.5, doi: 10.21437/Interspeech.2006-528
@inproceedings{aist06_interspeech, author={Gregory Aist and James Allen and Ellen Campana and Lucian Galescu and Carlos A. Gómez Gallo and Scott C. Stoness and Mary Swift and Michael Tanenhaus}, title={{Software architectures for incremental understanding of human speech}}, year=2006, booktitle={Proc. Interspeech 2006}, pages={paper 1869-Wed2FoP.5}, doi={10.21437/Interspeech.2006-528} }