Recently, kernel additive modeling with generalized spatial Wiener filtering (GW) was presented for music/voice separation. In this paper, an adaptive auditory filtering, called generalized weighted β-order MMSE estimation (WbE), is applied to the basic iterative kernel back-fitting algorithm for improving the separation performance of monaural music signal into music/voice components. In the proposed method, the perceptually weighting factor α and the singular value decomposition (SVD)-based factorized spectral amplitude exponent β for each kernel component are adaptively calculated for effective WbE-based auditory filtering performance. Experimental results show that the proposed method achieves better separation performance than GW and the existing Bayesian estimators.
Cite as: Lee, J.-Y., Cho, H.-S., Kim, H.-G. (2015) Vocal separation from monaural music using adaptive auditory filtering based on kernel back-fitting. Proc. Interspeech 2015, 3317-3320, doi: 10.21437/Interspeech.2015-668
@inproceedings{lee15i_interspeech, author={Jun-Yong Lee and Hye-Seung Cho and Hyoung-Gook Kim}, title={{Vocal separation from monaural music using adaptive auditory filtering based on kernel back-fitting}}, year=2015, booktitle={Proc. Interspeech 2015}, pages={3317--3320}, doi={10.21437/Interspeech.2015-668} }