ISCA Archive L3DAS 2022
ISCA Archive L3DAS 2022

Cross-Stitch Network Based System for Sound Event Localization and Detection in L3DAS22 Challenge

Jisheng Bai, Siwei Huang, Yafei Jia, Mou Wang, Jianfeng Chen

Sound event localization and detection (SELD) has great potential importance in daily life. Joint training of SELD can simultaneously share and model the acoustic knowledge of sound event detection and source localization. In this paper, we propose a novel SELD system based on a two-branch cross-stitch neural network. First, the proposed neural network takes 4-channel log-Mel energies and 3-channel intensity vector as two-branch inputs. Then we incorporate the cross-stitch unit and Transformer encoder to share and model the acoustic representations for SELD. Besides, we present a time-domain data augmentation method to effectively improve the performance of SELD. We evaluated the proposed system on the dataset of ICASSP 2022 L3DAS22 Challenge Task 2. Results show that our system outperforms the official baseline system by a large margin. We employ an ensemble of several models and achieve further improvement in the evaluation metrics.


doi: 10.21437/L3DAS.2022-2

Cite as: Bai, J., Huang, S., Jia, Y., Wang, M., Chen, J. (2022) Cross-Stitch Network Based System for Sound Event Localization and Detection in L3DAS22 Challenge. Proc. L3DAS22: Machine Learning for 3D Audio Signal Processing, 6-10, doi: 10.21437/L3DAS.2022-2

@inproceedings{bai22_l3das,
  author={Jisheng Bai and Siwei Huang and Yafei Jia and Mou Wang and Jianfeng Chen},
  title={{Cross-Stitch Network Based System for Sound Event Localization and Detection in L3DAS22 Challenge}},
  year=2022,
  booktitle={Proc. L3DAS22: Machine Learning for 3D Audio Signal Processing},
  pages={6--10},
  doi={10.21437/L3DAS.2022-2}
}