Combining Mask Estimates for Single Channel Audio Source Separation Using Deep Neural Networks

Emad M. Grais, Gerard Roma, Andrew J.R. Simpson, Mark D. Plumbley


Deep neural networks (DNNs) are usually used for single channel source separation to predict either soft or binary time frequency masks. The masks are used to separate the sources from the mixed signal. Binary masks produce separated sources with more distortion and less interference than soft masks. In this paper, we propose to use another DNN to combine the estimates of binary and soft masks to achieve the advantages and avoid the disadvantages of using each mask individually. We aim to achieve separated sources with low distortion and low interference between each other. Our experimental results show that combining the estimates of binary and soft masks using DNN achieves lower distortion than using each estimate individually and achieves as low interference as the binary mask.


DOI: 10.21437/Interspeech.2016-216

Cite as

Grais, E.M., Roma, G., Simpson, A.J., Plumbley, M.D. (2016) Combining Mask Estimates for Single Channel Audio Source Separation Using Deep Neural Networks. Proc. Interspeech 2016, 3339-3343.

Bibtex
@inproceedings{Grais+2016,
author={Emad M. Grais and Gerard Roma and Andrew J.R. Simpson and Mark D. Plumbley},
title={Combining Mask Estimates for Single Channel Audio Source Separation Using Deep Neural Networks},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-216},
url={http://dx.doi.org/10.21437/Interspeech.2016-216},
pages={3339--3343}
}