INTERSPEECH 2014
15th Annual Conference of the International Speech Communication Association

Singapore
September 14-18, 2014

Acoustic Event Detection and Localization with Regression Forests

Huy Phan, Marco Maaß, Radoslaw Mazur, Alfred Mertins

Universität zu Lübeck, Germany

This paper proposes an approach for the efficient automatic joint detection and localization of single-channel acoustic events using random forest regression. The audio signals are decomposed into multiple densely overlapping superframes annotated with event class labels and their displacements to the temporal starting and ending points of the events. Using the displacement information, a multivariate random forest regression model is learned for each event category to map each superframe to continuous estimates of onset and offset locations of the events. In addition, two classifiers are trained using random forest classification to classify superframes of background and different event categories. On testing, based on the detection of category-specific superframes using the classifiers, the learned regressor provides the estimates of onset and offset locations in time of the corresponding event. While posing event detection and localization as a regression problem is novel, the quantitative evaluation on ITC-Irst database of highly variable acoustic events shows the efficiency and potential of the proposed approach.

Full Paper

Bibliographic reference.  Phan, Huy / Maaß, Marco / Mazur, Radoslaw / Mertins, Alfred (2014): "Acoustic event detection and localization with regression forests", In INTERSPEECH-2014, 2524-2528.