ISCA Archive Interspeech 2017
ISCA Archive Interspeech 2017

Binary Mask Estimation Strategies for Constrained Imputation-Based Speech Enhancement

Ricard Marxer, Jon Barker

In recent years, speech enhancement by analysis-resynthesis has emerged as an alternative to conventional noise filtering approaches. Analysis-resynthesis replaces noisy speech with a signal that has been reconstructed from a clean speech model. It can deliver high-quality signals with no residual noise, but at the expense of losing information from the original signal that is not well-represented by the model. A recent compromise solution, called constrained resynthesis, solves this problem by only resynthesising spectro-temporal regions that are estimated to be masked by noise (conditioned on the evidence in the unmasked regions). In this paper we first extend the approach by: i) introducing multi-condition training and a deep discriminative model for the analysis stage; ii) introducing an improved resynthesis model that captures within-state cross-frequency dependencies. We then extend the previous stationary-noise evaluation by using real domestic audio noise from the CHiME-2 evaluation. We compare various mask estimation strategies while varying the degree of constraint by tuning the threshold for reliable speech detection. PESQ and log-spectral distance measures show that although mask estimation remains a challenge, it is only necessary to estimate a few reliable signal regions in order to achieve performance close to that achieved with an optimal oracle mask.

doi: 10.21437/Interspeech.2017-1257

Cite as: Marxer, R., Barker, J. (2017) Binary Mask Estimation Strategies for Constrained Imputation-Based Speech Enhancement. Proc. Interspeech 2017, 1988-1992, doi: 10.21437/Interspeech.2017-1257

  author={Ricard Marxer and Jon Barker},
  title={{Binary Mask Estimation Strategies for Constrained Imputation-Based Speech Enhancement}},
  booktitle={Proc. Interspeech 2017},