A Probability Weighted Beamformer for Noise Robust ASR

Suliang Bu, Yunxin Zhao, Meiyuh Hwang, Sining Sun

We investigate a novel approach to spatial filtering that is adaptive to conditions at different time-frequency (TF) points for noise removal by taking advantage of speech sparsity. Our approach combines a noise reduction beamformer with a minimum variance distortionless response (MVDR) beamformer or Generalized Eigenvalue (GEV) beamformer through TF posterior probabilities of speech presence (PPSP). To estimate PPSP, we study both statistical model-based and neural network based methods, where in the former, we use complex Gaussian mixture modeling (CGMM) on temporally augmented spatial spectral features and in the latter, we use neural network (NN) based TF masks to initialize speech and noise covariance matrices in CGMM. We have conducted experiments on CHiME-3 task. On its real noisy speech test set, our methods of feature augmentation, TF dependent spatial filter and NN-based mask initialization on covariances for CGMM have yielded relative word error rate (WER) reductions cumulatively by 8%, 16% and 25% over the original CGMM based MVDR. On the real test data, the three methods have also produced consistent WER reductions when replacing MVDR by GEV.

 DOI: 10.21437/Interspeech.2018-2427

Cite as: Bu, S., Zhao, Y., Hwang, M., Sun, S. (2018) A Probability Weighted Beamformer for Noise Robust ASR. Proc. Interspeech 2018, 3048-3052, DOI: 10.21437/Interspeech.2018-2427.

  author={Suliang Bu and Yunxin Zhao and Meiyuh Hwang and Sining Sun},
  title={A Probability Weighted Beamformer for Noise Robust ASR},
  booktitle={Proc. Interspeech 2018},