ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

Detecting audio events for semantic video search

M. Bugalho, J. PortĂȘlo, Isabel Trancoso, T. Pellegrini, Alberto Abad

This paper describes our work on audio event detection, one of our tasks in the European project VIDIVIDEO. Preliminary experiments with a small corpus of sound effects have shown the potential of this type of corpus for training purposes. This paper describes our experiments with SVM classifiers, and different features, using a 290-hour corpus of sound effects, which allowed us to build detectors for almost 50 semantic concepts. Although the performance of these detectors on the development set is quite good (achieving an average F-measure of 0.87), preliminary experiments on documentaries and films showed that the task is much harder in real-life videos, which so often include overlapping audio events.


doi: 10.21437/Interspeech.2009-335

Cite as: Bugalho, M., PortĂȘlo, J., Trancoso, I., Pellegrini, T., Abad, A. (2009) Detecting audio events for semantic video search. Proc. Interspeech 2009, 1151-1154, doi: 10.21437/Interspeech.2009-335

@inproceedings{bugalho09_interspeech,
  author={M. Bugalho and J. PortĂȘlo and Isabel Trancoso and T. Pellegrini and Alberto Abad},
  title={{Detecting audio events for semantic video search}},
  year=2009,
  booktitle={Proc. Interspeech 2009},
  pages={1151--1154},
  doi={10.21437/Interspeech.2009-335}
}