Eigenvector-Based Speech Mask Estimation Using Logistic Regression

Lukas Pfeifenberger, Matthias Zöhrer, Franz Pernkopf


In this paper, we use a logistic regression to learn a speech mask from the dominant eigenvector of the Power Spectral Density (PSD) matrix of a multi-channel speech signal corrupted by ambient noise. We employ this speech mask to construct the Generalized Eigenvalue (GEV) beamformer and a Wiener postfilter. Further, we extend the beamformer to compensate for speech distortions. We do not make any assumptions about the array geometry or the characteristics of the speech and noise sources. Those parameters are learned from training data. Our assumptions are that the speaker may move slowly in the near-field of the array, and that the noise is in the far-field. We compare our speech enhancement system against recent contributions using the CHiME4 corpus. We show that our approach yields superior results, both in terms of perceptual speech quality and speech mask estimation error.


 DOI: 10.21437/Interspeech.2017-1186

Cite as: Pfeifenberger, L., Zöhrer, M., Pernkopf, F. (2017) Eigenvector-Based Speech Mask Estimation Using Logistic Regression. Proc. Interspeech 2017, 2660-2664, DOI: 10.21437/Interspeech.2017-1186.


@inproceedings{Pfeifenberger2017,
  author={Lukas Pfeifenberger and Matthias Zöhrer and Franz Pernkopf},
  title={Eigenvector-Based Speech Mask Estimation Using Logistic Regression},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={2660--2664},
  doi={10.21437/Interspeech.2017-1186},
  url={http://dx.doi.org/10.21437/Interspeech.2017-1186}
}