Speech Enhancement Using Bayesian Wavenet

Kaizhi Qian, Yang Zhang, Shiyu Chang, Xuesong Yang, Dinei FlorĂȘncio, Mark Hasegawa-Johnson


In recent years, deep learning has achieved great success in speech enhancement. However, there are two major limitations regarding existing works. First, the Bayesian framework is not adopted in many such deep-learning-based algorithms. In particular, the prior distribution for speech in the Bayesian framework has been shown useful by regularizing the output to be in the speech space, and thus improving the performance. Second, the majority of the existing methods operate on the frequency domain of the noisy speech, such as spectrogram and its variations. The clean speech is then reconstructed using the approach of overlap-add, which is limited by its inherent performance upper bound. This paper presents a Bayesian speech enhancement framework, called BaWN (Bayesian WaveNet), which directly operates on raw audio samples. It adopts the recently announced WaveNet, which is shown to be effective in modeling conditional distributions of speech samples while generating natural speech. Experiments show that BaWN is able to recover clean and natural speech.


 DOI: 10.21437/Interspeech.2017-1672

Cite as: Qian, K., Zhang, Y., Chang, S., Yang, X., FlorĂȘncio, D., Hasegawa-Johnson, M. (2017) Speech Enhancement Using Bayesian Wavenet. Proc. Interspeech 2017, 2013-2017, DOI: 10.21437/Interspeech.2017-1672.


@inproceedings{Qian2017,
  author={Kaizhi Qian and Yang Zhang and Shiyu Chang and Xuesong Yang and Dinei FlorĂȘncio and Mark Hasegawa-Johnson},
  title={Speech Enhancement Using Bayesian Wavenet},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={2013--2017},
  doi={10.21437/Interspeech.2017-1672},
  url={http://dx.doi.org/10.21437/Interspeech.2017-1672}
}