This paper proposes an approach for the efficient automatic joint detection and localization of single-channel acoustic events using random forest regression. The audio signals are decomposed into multiple densely overlapping superframes annotated with event class labels and their displacements to the temporal starting and ending points of the events. Using the displacement information, a multivariate random forest regression model is learned for each event category to map each superframe to continuous estimates of onset and offset locations of the events. In addition, two classifiers are trained using random forest classification to classify superframes of background and different event categories. On testing, based on the detection of category-specific superframes using the classifiers, the learned regressor provides the estimates of onset and offset locations in time of the corresponding event. While posing event detection and localization as a regression problem is novel, the quantitative evaluation on ITC-Irst database of highly variable acoustic events shows the efficiency and potential of the proposed approach.
Bibliographic reference. Phan, Huy / Maaß, Marco / Mazur, Radoslaw / Mertins, Alfred (2014): "Acoustic event detection and localization with regression forests", In INTERSPEECH-2014, 2524-2528.