Real-Time Speech Enhancement with GCC-NMF

Sean U.N. Wood, Jean Rouat


We develop an online variant of the GCC-NMF blind speech enhancement algorithm and study its performance on two-channel mixtures of speech and real-world noise from the SiSEC separation challenge. While GCC-NMF performs enhancement independently for each time frame, the NMF dictionary, its activation coefficients, and the target TDOA are derived using the entire mixture signal, thus precluding its use online. Pre-learning the NMF dictionary using the CHiME dataset and inferring its activation coefficients online yields similar overall PEASS scores to the mixture-learned method, thus generalizing to new speakers, acoustic environments, and noise conditions. Surprisingly, if we forgo coefficient inference altogether, this approach outperforms both the mixture-learned method and most algorithms from the SiSEC challenge to date. Furthermore, the trade-off between interference suppression and target fidelity may be controlled online by adjusting the target TDOA window width. Finally, integrating online target localization with max-pooled GCC-PHAT yields only somewhat decreased performance compared to offline localization. We test a real-time implementation of the online GCC-NMF blind speech enhancement system on a variety of hardware platforms, with performance made to degrade smoothly with decreasing computational power using smaller pre-learned dictionaries.


 DOI: 10.21437/Interspeech.2017-1458

Cite as: Wood, S.U., Rouat, J. (2017) Real-Time Speech Enhancement with GCC-NMF. Proc. Interspeech 2017, 2665-2669, DOI: 10.21437/Interspeech.2017-1458.


@inproceedings{Wood2017,
  author={Sean U.N. Wood and Jean Rouat},
  title={Real-Time Speech Enhancement with GCC-NMF},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={2665--2669},
  doi={10.21437/Interspeech.2017-1458},
  url={http://dx.doi.org/10.21437/Interspeech.2017-1458}
}