A Feature Study for Masking-Based Reverberant Speech Separation

Masood Delfarah, DeLiang Wang


Monaural speech separation in reverberant conditions is very challenging. In masking-based separation, features extracted from speech mixtures are employed to predict a time-frequency mask. Robust feature extraction is crucial for the performance of supervised speech separation in adverse acoustic environments. Using objective speech intelligibility as the metric, we investigate a wide variety of monaural features in low signal-to-noise ratios and moderate to high reverberation. Deep neural networks are employed as the learning machine in our feature investigation. We find considerable performance gain using a contextual window in reverberant speech processing, likely due to temporal structure of reverberation. In addition, we systematically evaluate feature combinations. In unmatched noise and reverberation conditions, the resulting feature set from this study substantially outperforms previously employed sets for speech separation in anechoic conditions.


DOI: 10.21437/Interspeech.2016-382

Cite as

Delfarah, M., Wang, D. (2016) A Feature Study for Masking-Based Reverberant Speech Separation. Proc. Interspeech 2016, 555-559.

Bibtex
@inproceedings{Delfarah+2016,
author={Masood Delfarah and DeLiang Wang},
title={A Feature Study for Masking-Based Reverberant Speech Separation},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-382},
url={http://dx.doi.org/10.21437/Interspeech.2016-382},
pages={555--559}
}