Finding Regions of Interest from Multimodal Human-Robot Interactions

Pablo Azagra, Javier Civera, Ana C. Murillo


Learning new concepts, such as object models, from humanrobot interactions entails different recognition capabilities on a robotic platform. This work proposes a hierarchical approach to address the extra challenges from natural interaction scenarios by exploiting multimodal data. First, a speech-guided recognition of the type of interaction happening is presented. This first step facilitates the following segmentation of relevant visual information to learn the target object model. Our approach includes three complementary strategies to find Regions of Interest (RoI) depending on the interaction type: Point, Show or Speak. We run an exhaustive validation of the proposed strategies using the recently published Multimodal Human-Robot Interaction dataset [1]. The currently presented pipeline is built on the pipeline proposed with the dataset and provides a more complete baseline for target object segmentation on all its recordings.


 DOI: 10.21437/GLU.2017-15

Cite as: Azagra, P., Civera, J., Murillo, A.C. (2017) Finding Regions of Interest from Multimodal Human-Robot Interactions. Proc. GLU 2017 International Workshop on Grounding Language Understanding, 73-77, DOI: 10.21437/GLU.2017-15.


@inproceedings{Azagra2017,
  author={Pablo Azagra and Javier Civera and Ana C. Murillo},
  title={Finding Regions of Interest from Multimodal Human-Robot Interactions},
  year=2017,
  booktitle={Proc. GLU 2017 International Workshop on Grounding Language Understanding},
  pages={73--77},
  doi={10.21437/GLU.2017-15},
  url={http://dx.doi.org/10.21437/GLU.2017-15}
}