Music Source Activity Detection and Separation Using Deep Attractor Network

Rajath Kumar, Yi Luo, Nima Mesgarani


In music signal processing, singing voice detection and music source separation are widely researched topics. Recent progress in deep neural network based source separation has advanced the state of the performance in the problem of vocal and instrument separation, while the problem of joint source activity detection and separation remains unexplored. In this paper, we propose an approach to perform source activity detection using the high-dimensional embedding generated by Deep Attractor Network (DANet) when trained for music source separation. By defining both tasks together, DANet is able to dynamically estimate the number of outputs depending on the active sources. We propose an Expectation-Maximization (EM) training paradigm for DANet which further improves the separation performance of the original DANet. Experiments show that our network achieves higher source separation and comparable source activity detection against a baseline system.


 DOI: 10.21437/Interspeech.2018-2326

Cite as: Kumar, R., Luo, Y., Mesgarani, N. (2018) Music Source Activity Detection and Separation Using Deep Attractor Network. Proc. Interspeech 2018, 347-351, DOI: 10.21437/Interspeech.2018-2326.


@inproceedings{Kumar2018,
  author={Rajath Kumar and Yi Luo and Nima Mesgarani},
  title={Music Source Activity Detection and Separation Using Deep Attractor Network},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={347--351},
  doi={10.21437/Interspeech.2018-2326},
  url={http://dx.doi.org/10.21437/Interspeech.2018-2326}
}