Robust Sound Event Detection in Continuous Audio Environments

Haomin Zhang, Ian McLoughlin, Yan Song


Sound event detection in real world environments has attracted significant research interest recently because of it’s applications in popular fields such as machine hearing and automated surveillance, as well as in sound scene understanding. This paper considers continuous robust sound event detection, which means multiple overlapped sound events in different types of interfering noise. First, a standard evaluation task is outlined based upon existing testing data sets for the sound event classification of isolated sounds. This paper then proposes and evaluates the use of spectrogram image features employing an energy detector to segment sound events, before developing a novel segmentation method making use of a Bayesian inference criteria. At the back end, a convolutional neural network is used to classify detected regions, and this combination is compared to several alternative approaches. The proposed method is shown capable of achieving very good performance compared with current state-of-the-art techniques.


DOI: 10.21437/Interspeech.2016-392

Cite as

Zhang, H., McLoughlin, I., Song, Y. (2016) Robust Sound Event Detection in Continuous Audio Environments. Proc. Interspeech 2016, 2977-2981.

Bibtex
@inproceedings{Zhang+2016,
author={Haomin Zhang and Ian McLoughlin and Yan Song},
title={Robust Sound Event Detection in Continuous Audio Environments},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-392},
url={http://dx.doi.org/10.21437/Interspeech.2016-392},
pages={2977--2981}
}