Acoustic Scene Classification (ASC) and Sound Event Detection (SED) are two separate tasks in the field of computational sound scene analysis. In this work, we present a new dataset with both sound scene and sound event labels and use this to demonstrate a novel method for jointly classifying sound scenes and recognizing sound events. We show that by taking a joint approach, learning is more efficient and whilst improvements are still needed for sound event detection, SED results are robust in a dataset where the sample distribution is skewed towards sound scenes.
Cite as: Bear, H.L., Nolasco, I., Benetos, E. (2019) Towards Joint Sound Scene and Polyphonic Sound Event Recognition. Proc. Interspeech 2019, 4594-4598, doi: 10.21437/Interspeech.2019-2169
@inproceedings{bear19_interspeech, author={Helen L. Bear and Inês Nolasco and Emmanouil Benetos}, title={{Towards Joint Sound Scene and Polyphonic Sound Event Recognition}}, year=2019, booktitle={Proc. Interspeech 2019}, pages={4594--4598}, doi={10.21437/Interspeech.2019-2169} }