10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Detecting Audio Events for Semantic Video Search

M. Bugalho (1), J. Portêlo (2), Isabel Trancoso (1), T. Pellegrini (2), Alberto Abad (2)

(1) INESC-ID Lisboa/IST, Portugal
(2) INESC-ID Lisboa, Portugal

This paper describes our work on audio event detection, one of our tasks in the European project VIDIVIDEO. Preliminary experiments with a small corpus of sound effects have shown the potential of this type of corpus for training purposes. This paper describes our experiments with SVM classifiers, and different features, using a 290-hour corpus of sound effects, which allowed us to build detectors for almost 50 semantic concepts. Although the performance of these detectors on the development set is quite good (achieving an average F-measure of 0.87), preliminary experiments on documentaries and films showed that the task is much harder in real-life videos, which so often include overlapping audio events.

Full Paper

Bibliographic reference.  Bugalho, M. / Portêlo, J. / Trancoso, Isabel / Pellegrini, T. / Abad, Alberto (2009): "Detecting audio events for semantic video search", In INTERSPEECH-2009, 1151-1154.