ISCA Archive L3DAS 2022
ISCA Archive L3DAS 2022

Cross-Stitch Network with Adaptive Loss Weightage for Sound Event Localization and Detection

Teck Kai Chan, Rohan Kumar Das

A Sound Event Localization and Detection system is capable of identifying the type and source of an acoustic event in a 3-dimensional space. Typically, such a system is trained using a Multi-Task Learning (MTL) framework, where the loss propagated is a linear combination of individual task losses. However, it has been found that the hard-parameter sharing strategy for an MTL framework can degrade the system performance. In addition, deriving the optimal loss combination can be time-consuming empirically and may not be the best way. This work proposes a cross-stitch network with a novel attention module that improves the feature representations. Further, we propose the use of a loss balancing algorithm to weigh the loss contribution adaptively, thereby eliminating the need to tune the loss weightage empirically. The proposed system is then evaluated on L3DAS22 challenge dataset as a part of our challenge participation and achieves a significant performance improvement of over 20% compared to the state-of-the-art SELDnet. We also note that our system ranked 3rd in the L3DAS22 challenge Task 2 without any data augmentation or external dataset to increase the training samples.


doi: 10.21437/L3DAS.2022-3

Cite as: Chan, T.K., Das, R.K. (2022) Cross-Stitch Network with Adaptive Loss Weightage for Sound Event Localization and Detection. Proc. L3DAS22: Machine Learning for 3D Audio Signal Processing, 11-15, doi: 10.21437/L3DAS.2022-3

@inproceedings{chan22_l3das,
  author={Teck Kai Chan and Rohan Kumar Das},
  title={{Cross-Stitch Network with Adaptive Loss Weightage for Sound Event Localization and Detection}},
  year=2022,
  booktitle={Proc. L3DAS22: Machine Learning for 3D Audio Signal Processing},
  pages={11--15},
  doi={10.21437/L3DAS.2022-3}
}