Monaural Source Separation Using a Random Forest Classifier

Cosimo Riday, Saurabh Bhargava, Richard H.R. Hahnloser, Shih-Chii Liu


We address the problem of separating two audio sources from a single channel mixture recording. A novel method called Multi Layered Random Forest (MLRF) that learns a binary mask for both the sources is presented. Random Forest (RF) classifiers are trained for each frequency band of a source spectrogram. A specialized set of linear transformations are applied to a local time-frequency (T-F) neighborhood of the mixture that captures relevant local statistics. A sampling method is presented that efficiently samples T-F training bins in each frequency band. We draw equal numbers of dominant (more power) training samples from the two sources for RF classifiers that estimate the Ideal Binary Mask (IBM). An estimated IBM in a given layer is used to train a RF classifier in the next higher layer of the MLRF hierarchy. On average, MLRF performs better than deep Recurrent Neural Networks (RNNs) and Non-Negative Sparse Coding (NNSC) in signal-to-noise ratio (SNR) of reconstructed audio, overall T-F bin classification accuracy, as well as PESQ and STOI scores. Additionally, we demonstrate the ability of the MLRF to correctly reconstruct T-F bins of the target even when the latter has lower power in that frequency band.


DOI: 10.21437/Interspeech.2016-252

Cite as

Riday, C., Bhargava, S., Hahnloser, R.H., Liu, S. (2016) Monaural Source Separation Using a Random Forest Classifier. Proc. Interspeech 2016, 3344-3348.

Bibtex
@inproceedings{Riday+2016,
author={Cosimo Riday and Saurabh Bhargava and Richard H.R. Hahnloser and Shih-Chii Liu},
title={Monaural Source Separation Using a Random Forest Classifier},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-252},
url={http://dx.doi.org/10.21437/Interspeech.2016-252},
pages={3344--3348}
}