ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Selective use of gaze information to improve ASR performance in noisy environments by cache-based class language model adaptation

Ao Shen, Neil Cooke, Martin Russell

Using information from a person's gaze has potential to improve ASR performance in acoustically noisy environments. However, previous work has resulted in relatively minor improvements. A cache-based language model adaptation framework is presented where the cache contains a sequence of gaze events, classes represent visual context and task, and the relative importance of gaze events is considered. An implementation in a full ASR system is described and evaluated on a set of gaze-speech data recorded in both a quiet and acoustically noisy environment. Results demonstrate that selectively using gaze events based on measured characteristics significantly increases the performance improvements in WER on speech recorded in the noisy environment from 6.34% to 10.58%. This work highlights: the need to selectively use information from gaze, to constrain the redistribution of probability mass between words during adaptation via classes, and to evaluate the system with gaze and speech collected in environments that represent the real-world utility.


doi: 10.21437/Interspeech.2013-454

Cite as: Shen, A., Cooke, N., Russell, M. (2013) Selective use of gaze information to improve ASR performance in noisy environments by cache-based class language model adaptation. Proc. Interspeech 2013, 1844-1848, doi: 10.21437/Interspeech.2013-454

@inproceedings{shen13_interspeech,
  author={Ao Shen and Neil Cooke and Martin Russell},
  title={{Selective use of gaze information to improve ASR performance in noisy environments by cache-based class language model adaptation}},
  year=2013,
  booktitle={Proc. Interspeech 2013},
  pages={1844--1848},
  doi={10.21437/Interspeech.2013-454}
}