PhaseNet: Discretized Phase Modeling with Deep Neural Networks for Audio Source Separation

Naoya Takahashi, Purvi Agrawal, Nabarun Goswami, Yuki Mitsufuji


Previous research on audio source separation based on deep neural networks (DNNs) mainly focuses on estimating the magnitude spectrum of target sources and typically, phase of the mixture signal is combined with the estimated magnitude spectra in an ad-hoc way. Although recovering target phase is assumed to be important for the improvement of separation quality, it can be difficult to handle the periodic nature of the phase with the regression approach. Unwrapping phase is one way to eliminate the phase discontinuity, however, it increases the range of value along with the times of unwrapping, making it difficult for DNNs to model. To overcome this difficulty, we propose to treat the phase estimation problem as a classification problem by discretizing phase values and assigning class indices to them. Experimental results show that our classification-based approach 1) successfully recovers the phase of the target source in the discretized domain, 2) improves signal-to-distortion ratio (SDR) over the regression-based approach in both speech enhancement task and music source separation (MSS) task and 3) outperforms state-of-the-art MSS.


 DOI: 10.21437/Interspeech.2018-1773

Cite as: Takahashi, N., Agrawal, P., Goswami, N., Mitsufuji, Y. (2018) PhaseNet: Discretized Phase Modeling with Deep Neural Networks for Audio Source Separation. Proc. Interspeech 2018, 2713-2717, DOI: 10.21437/Interspeech.2018-1773.


@inproceedings{Takahashi2018,
  author={Naoya Takahashi and Purvi Agrawal and Nabarun Goswami and Yuki Mitsufuji},
  title={PhaseNet: Discretized Phase Modeling with Deep Neural Networks for Audio Source Separation},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={2713--2717},
  doi={10.21437/Interspeech.2018-1773},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1773}
}