Kernel Machines Beat Deep Neural Networks on Mask-Based Single-Channel Speech Enhancement

Like Hui, Siyuan Ma, Mikhail Belkin


We apply a fast kernel method for mask-based single-channel speech enhancement. Specifically, our method solves a kernel regression problem associated to a non-smooth kernel function (exponential power kernel) with a highly efficient iterative method (EigenPro). Due to the simplicity of this method, its hyper-parameters such as kernel bandwidth can be automatically and efficiently selected using line search with subsamples of training data. We observe an empirical correlation between the regression loss (mean square error) and regular metrics for speech enhancement. This observation justifies our training target and motivates us to achieve lower regression loss by training separate kernel models for different frequency subbands. We compare our method with the state-of-the-art deep neural networks on mask-based HINT and TIMIT. Experimental results show that our kernel method consistently outperforms deep neural networks while requiring less training time.


 DOI: 10.21437/Interspeech.2019-1344

Cite as: Hui, L., Ma, S., Belkin, M. (2019) Kernel Machines Beat Deep Neural Networks on Mask-Based Single-Channel Speech Enhancement. Proc. Interspeech 2019, 2748-2752, DOI: 10.21437/Interspeech.2019-1344.


@inproceedings{Hui2019,
  author={Like Hui and Siyuan Ma and Mikhail Belkin},
  title={{Kernel Machines Beat Deep Neural Networks on Mask-Based Single-Channel Speech Enhancement}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={2748--2752},
  doi={10.21437/Interspeech.2019-1344},
  url={http://dx.doi.org/10.21437/Interspeech.2019-1344}
}