Biophysically-inspired Features Improve the Generalizability of Neural Network-based Speech Enhancement Systems

Deepak Baby, Sarah Verhulst


Recent advances in neural network (NN)-based speech enhancement schemes are shown to outperform most conventional techniques. However, the performance of such systems in adverse listening conditions such as negative signal-to-noise ratios and unseen noises is still far from that of humans. Motivated by the remarkable performance of humans under these challenging conditions, this paper investigates whether biophysically-inspired features can mitigate the poor generalization capabilities of NN-based speech enhancement systems. We make use of features derived from several human auditory periphery models for training a speech enhancement system that employs long short-term memory (LSTM) and evaluate them on a variety of mismatched testing conditions. The results reveal that biophysically-inspired auditory models such as nonlinear transmission line models improve the generalizability of LSTM-based noise suppression systems in terms of various objective quality measures, suggesting that such features lead to robust speech representations that are less sensitive to the noise type.


 DOI: 10.21437/Interspeech.2018-1237

Cite as: Baby, D., Verhulst, S. (2018) Biophysically-inspired Features Improve the Generalizability of Neural Network-based Speech Enhancement Systems. Proc. Interspeech 2018, 3264-3268, DOI: 10.21437/Interspeech.2018-1237.


@inproceedings{Baby2018,
  author={Deepak Baby and Sarah Verhulst},
  title={Biophysically-inspired Features Improve the Generalizability of Neural Network-based Speech Enhancement Systems},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={3264--3268},
  doi={10.21437/Interspeech.2018-1237},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1237}
}