Rare Sound Event Detection Using Deep Learning and Data Augmentation

Yanping Chen, Hongxia Jin


There is an increasing interest in smart environment and a growing adoption of smart devices. Smart assistants such as Google Home and Amazon Alexa, although focus on speech, could be extended to identify domestic events in real-time to provide more and better smart functions. Sound event detection aims to detect multiple target sound events that may happen simultaneously. The task is challenging due to the overlapping of sound events, the highly imbalanced nature of target and non-target data, and the complicated real-world background noise. In this paper, we proposed a unified approach that takes advantages of both the deep learning and data augmentation. A convolutional neural network (CNN) was combined with a feed-forward neural network (FNN) to improve the detection performance, and a dynamic time warping based data augmentation (DA) method was proposed to address the data imbalance problem. Experiments on several datasets showed a more than 7% increase in accuracy compared to the state-of-the-art approaches.


 DOI: 10.21437/Interspeech.2019-1985

Cite as: Chen, Y., Jin, H. (2019) Rare Sound Event Detection Using Deep Learning and Data Augmentation. Proc. Interspeech 2019, 619-623, DOI: 10.21437/Interspeech.2019-1985.


@inproceedings{Chen2019,
  author={Yanping Chen and Hongxia Jin},
  title={{Rare Sound Event Detection Using Deep Learning and Data Augmentation}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={619--623},
  doi={10.21437/Interspeech.2019-1985},
  url={http://dx.doi.org/10.21437/Interspeech.2019-1985}
}