Interspeech'2005 - Eurospeech
We investigate recognition of spontaneous speech using the focus of visual attention as a secondary cue to speech. In our experiment we collected a corpus of eye and speech data where one participant describes a geographical map to another while having their eye movements tracked. Using this corpus we characterise the coupling between eye movement and speech. Speech recognition results are presented to demonstrate proof of concept for development of a bimodal ASR using focus of visual attention to drive a dynamic language model. Marginal improvement in WER is observed.
Bibliographic reference. Cooke, Neil / Russell, Martin (2005): "Using the focus of visual attention to improve spontaneous speech recognition", In INTERSPEECH-2005, 1213-1216.