ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

An audio-visual attention system for online association learning

Martin Heckmann, Holger Brandl, Xavier Domont, Bram Bolder, Frank Joublin, Christian Goerick

We present an audio-visual attention system for speech based interaction with a humanoid robot where a tutor can teach visual properties/locations (e.g “left”) and corresponding, arbitrary speech labels. The acoustic signal is segmented via the attention system and speech labels are learned from a few repetitions of the label by the tutor. The attention system integrates bottom-up stimulus driven saliency calculation (delay-and-sum beamforming, adaptive noise level estimation) and top-down modulation (spectral properties, segment length, movement and interaction status of the robot). We evaluate the performance of different aspects of the system based on a small dataset.


doi: 10.21437/Interspeech.2009-619

Cite as: Heckmann, M., Brandl, H., Domont, X., Bolder, B., Joublin, F., Goerick, C. (2009) An audio-visual attention system for online association learning. Proc. Interspeech 2009, 2171-2174, doi: 10.21437/Interspeech.2009-619

@inproceedings{heckmann09_interspeech,
  author={Martin Heckmann and Holger Brandl and Xavier Domont and Bram Bolder and Frank Joublin and Christian Goerick},
  title={{An audio-visual attention system for online association learning}},
  year=2009,
  booktitle={Proc. Interspeech 2009},
  pages={2171--2174},
  doi={10.21437/Interspeech.2009-619}
}