ISCA Archive Interspeech 2017
ISCA Archive Interspeech 2017

Combined Multi-Channel NMF-Based Robust Beamforming for Noisy Speech Recognition

Masato Mimura, Yoshiaki Bando, Kazuki Shimada, Shinsuke Sakai, Kazuyoshi Yoshii, Tatsuya Kawahara

We propose a novel acoustic beamforming method using blind source separation (BSS) techniques based on non-negative matrix factorization (NMF). In conventional mask-based approaches, hard or soft masks are estimated and beamforming is performed using speech and noise spatial covariance matrices calculated from masked noisy observations, but the phase information of the target speech is not adequately preserved. In the proposed method, we perform complex-domain source separation based on multi-channel NMF with rank-1 spatial model (rank-1 MNMF) to obtain a speech spatial covariance matrix for estimating a steering vector for the target speech utilizing the separated speech observation in each time-frequency bin. This accurate steering vector estimation is effectively combined with our novel noise mask prediction method using multi-channel robust NMF (MRNMF) to construct a Maximum Likelihood (ML) beamformer that achieved a better speech recognition performance than a state-of-the-art DNN-based beamformer with no environment-specific training. Superiority of the phase preserving source separation to real-valued masks in beamforming is also confirmed through ASR experiments.


doi: 10.21437/Interspeech.2017-642

Cite as: Mimura, M., Bando, Y., Shimada, K., Sakai, S., Yoshii, K., Kawahara, T. (2017) Combined Multi-Channel NMF-Based Robust Beamforming for Noisy Speech Recognition. Proc. Interspeech 2017, 2451-2455, doi: 10.21437/Interspeech.2017-642

@inproceedings{mimura17_interspeech,
  author={Masato Mimura and Yoshiaki Bando and Kazuki Shimada and Shinsuke Sakai and Kazuyoshi Yoshii and Tatsuya Kawahara},
  title={{Combined Multi-Channel NMF-Based Robust Beamforming for Noisy Speech Recognition}},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={2451--2455},
  doi={10.21437/Interspeech.2017-642}
}