Real-time Single-channel Dereverberation and Separation with Time-domain Audio Separation Network

Yi Luo, Nima Mesgarani


We investigate the recently proposed Time-domain Audio Separation Network (TasNet) in the task of real-time single-channel speech dereverberation. Unlike systems that take time-frequency representation of the audio as input, TasNet learns an adaptive front-end in replacement of the time-frequency representation by a time-domain convolutional non-negative autoencoder. We show that by formulating the dereverberation problem as a denoising problem where the direct path is separated from the reverberations, a TasNet denoising autoencoder can outperform a deep LSTM baseline on log-power magnitude spectrogram input in both causal and non-causal settings. We further show that adjusting the stride size in the convolutional autoencoder helps both the dereverberation and separation performance.


 DOI: 10.21437/Interspeech.2018-2290

Cite as: Luo, Y., Mesgarani, N. (2018) Real-time Single-channel Dereverberation and Separation with Time-domain Audio Separation Network. Proc. Interspeech 2018, 342-346, DOI: 10.21437/Interspeech.2018-2290.


@inproceedings{Luo2018,
  author={Yi Luo and Nima Mesgarani},
  title={Real-time Single-channel Dereverberation and Separation with Time-domain Audio Separation Network},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={342--346},
  doi={10.21437/Interspeech.2018-2290},
  url={http://dx.doi.org/10.21437/Interspeech.2018-2290}
}