Towards Joint Sound Scene and Polyphonic Sound Event Recognition

Helen L. Bear, Inês Nolasco, Emmanouil Benetos


Acoustic Scene Classification (ASC) and Sound Event Detection (SED) are two separate tasks in the field of computational sound scene analysis. In this work, we present a new dataset with both sound scene and sound event labels and use this to demonstrate a novel method for jointly classifying sound scenes and recognizing sound events. We show that by taking a joint approach, learning is more efficient and whilst improvements are still needed for sound event detection, SED results are robust in a dataset where the sample distribution is skewed towards sound scenes.


 DOI: 10.21437/Interspeech.2019-2169

Cite as: Bear, H.L., Nolasco, I., Benetos, E. (2019) Towards Joint Sound Scene and Polyphonic Sound Event Recognition. Proc. Interspeech 2019, 4594-4598, DOI: 10.21437/Interspeech.2019-2169.


@inproceedings{Bear2019,
  author={Helen L. Bear and Inês Nolasco and Emmanouil Benetos},
  title={{Towards Joint Sound Scene and Polyphonic Sound Event Recognition}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={4594--4598},
  doi={10.21437/Interspeech.2019-2169},
  url={http://dx.doi.org/10.21437/Interspeech.2019-2169}
}