10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

An Audio-Visual Attention System for Online Association Learning

Martin Heckmann, Holger Brandl, Xavier Domont, Bram Bolder, Frank Joublin, Christian Goerick

Honda Research Institute GmbH, Germany

We present an audio-visual attention system for speech based interaction with a humanoid robot where a tutor can teach visual properties/locations (e.g “left”) and corresponding, arbitrary speech labels. The acoustic signal is segmented via the attention system and speech labels are learned from a few repetitions of the label by the tutor. The attention system integrates bottom-up stimulus driven saliency calculation (delay-and-sum beamforming, adaptive noise level estimation) and top-down modulation (spectral properties, segment length, movement and interaction status of the robot). We evaluate the performance of different aspects of the system based on a small dataset.

Full Paper

Bibliographic reference.  Heckmann, Martin / Brandl, Holger / Domont, Xavier / Bolder, Bram / Joublin, Frank / Goerick, Christian (2009): "An audio-visual attention system for online association learning", In INTERSPEECH-2009, 2171-2174.