A Comprehensive Study of Speech Separation: Spectrogram vs Waveform Separation

Fahimeh Bahmaninezhad, Jian Wu, Rongzhi Gu, Shi-Xiong Zhang, Yong Xu, Meng Yu, Dong Yu


Speech separation has been studied widely for single-channel close-talk microphone recordings over the past few years; developed solutions are mostly in frequency-domain. Recently, a raw audio waveform separation network (TasNet) is introduced for single-channel data, with achieving high Si-SNR (scale-invariant source-to-noise ratio) and SDR (source-to-distortion ratio) comparing against the state-of-the-art solution in frequency-domain. In this study, we incorporate effective components of the TasNet into a frequency-domain separation method. We compare both for alternative scenarios. We introduce a solution for directly optimizing the separation criterion in frequency-domain networks. In addition to speech separation objective and subjective measurements, we evaluate the separation performance on a speech recognition task as well. We study the speech separation problem for far-field data (more similar to naturalistic audio streams) and develop multi-channel solutions for both frequency and time-domain separators with utilizing spectral, spatial and speaker location information. For our experiments, we simulated multi-channel spatialized reverberate WSJ0-2mix dataset. Our experimental results show that spectrogram separation can achieve competitive performance with better network design. Multi-channel framework as well is shown to improve the single-channel performance relatively up to +35.5% and +46% in terms of WER and SDR, respectively.


 DOI: 10.21437/Interspeech.2019-3181

Cite as: Bahmaninezhad, F., Wu, J., Gu, R., Zhang, S., Xu, Y., Yu, M., Yu, D. (2019) A Comprehensive Study of Speech Separation: Spectrogram vs Waveform Separation. Proc. Interspeech 2019, 4574-4578, DOI: 10.21437/Interspeech.2019-3181.


@inproceedings{Bahmaninezhad2019,
  author={Fahimeh Bahmaninezhad and Jian Wu and Rongzhi Gu and Shi-Xiong Zhang and Yong Xu and Meng Yu and Dong Yu},
  title={{A Comprehensive Study of Speech Separation: Spectrogram vs Waveform Separation}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={4574--4578},
  doi={10.21437/Interspeech.2019-3181},
  url={http://dx.doi.org/10.21437/Interspeech.2019-3181}
}