ISCA Archive Odyssey 2014
ISCA Archive Odyssey 2014

Large Scale Learning of a Joint Embedding Space

Samy Bengio

Rich document annotation is the task of providing textual semantic to documents like images, videos, music, etc, by ranking a large set of possible annotations according to how they correspond to a given document. In the large scale setting, there could be millions of such rich documents to process and hundreds of thousands of potential distinct annotations. In order to achieve such a task we propose to build a so-called "embedding space", into which both documents and annotations can be automatically projected. In such a space, one can then find the nearest annotations to a given image/video/music, or annotations similar to a given annotation. One can even build a semantic tree from these annotations, that corresponds to how concepts (annotations) are similar to each other with respect to their rich document characteristics. We propose a new efficient learning-to-rank approach that can scale to such datasets and show some annotation results for images and music databases.


Cite as: Bengio, S. (2014) Large Scale Learning of a Joint Embedding Space. Proc. The Speaker and Language Recognition Workshop (Odyssey 2014)

@inproceedings{bengio14_odyssey,
  author={Samy Bengio},
  title={{Large Scale Learning of a Joint Embedding Space}},
  year=2014,
  booktitle={Proc. The Speaker and Language Recognition Workshop (Odyssey 2014)}
}