End-to-End Music Source Separation: Is it Possible in the Waveform Domain?

Francesc Lluís, Jordi Pons, Xavier Serra


Most of the currently successful source separation techniques use the magnitude spectrogram as input, and are therefore by default omitting part of the signal: the phase. To avoid omitting potentially useful information, we study the viability of using end-to-end models for music source separation — which take into account all the information available in the raw audio signal, including the phase. Although during the last decades end-to-end music source separation has been considered almost unattainable, our results confirm that waveform-based models can perform similarly (if not better) than a spectrogram-based deep learning model. Namely: a Wavenet-based model we propose and Wave-U-Net can outperform DeepConvSep, a recent spectrogram-based deep learning model.


 DOI: 10.21437/Interspeech.2019-1177

Cite as: Lluís, F., Pons, J., Serra, X. (2019) End-to-End Music Source Separation: Is it Possible in the Waveform Domain?. Proc. Interspeech 2019, 4619-4623, DOI: 10.21437/Interspeech.2019-1177.


@inproceedings{Lluís2019,
  author={Francesc Lluís and Jordi Pons and Xavier Serra},
  title={{End-to-End Music Source Separation: Is it Possible in the Waveform Domain?}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={4619--4623},
  doi={10.21437/Interspeech.2019-1177},
  url={http://dx.doi.org/10.21437/Interspeech.2019-1177}
}