In music interferences condition, source separation is very difficult. In this paper, we propose a novel recurrent network exploiting the auto-regressions of speech and music interference for source separation. An auto-regression can capture the short-term temporal dependencies in data to help the source separation. For the separation, we independently separate the magnitude spectra of speech and interference from the mixture spectra by including an extra masking layer in the recurrent network. Compared to directly evaluating the ideal mask, the extra masking layer relaxes the assumption of independence between speech and interference which is more suitable for the real-world environments. Using the separated spectra of speech and interference, we further explore a discriminative training objective and joint optimization framework for the proposed network, which incorporates the correlations and spectral dependencies of speech and interference into the separation. Systematic experiments show that the proposed model is competitive with the state-of-the-art method in singing-voice separations.
Bibliographic reference. Nie, Shuai / Xue, Wei / Liang, Shan / Zhang, Xueliang / Liu, Wenju / Qiao, Liwei / Li, Jianping (2015): "Joint optimization of recurrent networks exploiting source auto-regression for source separation", In INTERSPEECH-2015, 3307-3311.