16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

Joint Optimization of Recurrent Networks Exploiting Source Auto-Regression for Source Separation

Shuai Nie (1), Wei Xue (1), Shan Liang (1), Xueliang Zhang (2), Wenju Liu (1), Liwei Qiao (3), Jianping Li (3)

(1) Chinese Academy of Sciences, China
(2) Inner Mongolia University, China
(3) SGCC, China

In music interferences condition, source separation is very difficult. In this paper, we propose a novel recurrent network exploiting the auto-regressions of speech and music interference for source separation. An auto-regression can capture the short-term temporal dependencies in data to help the source separation. For the separation, we independently separate the magnitude spectra of speech and interference from the mixture spectra by including an extra masking layer in the recurrent network. Compared to directly evaluating the ideal mask, the extra masking layer relaxes the assumption of independence between speech and interference which is more suitable for the real-world environments. Using the separated spectra of speech and interference, we further explore a discriminative training objective and joint optimization framework for the proposed network, which incorporates the correlations and spectral dependencies of speech and interference into the separation. Systematic experiments show that the proposed model is competitive with the state-of-the-art method in singing-voice separations.

Full Paper

Bibliographic reference.  Nie, Shuai / Xue, Wei / Liang, Shan / Zhang, Xueliang / Liu, Wenju / Qiao, Liwei / Li, Jianping (2015): "Joint optimization of recurrent networks exploiting source auto-regression for source separation", In INTERSPEECH-2015, 3307-3311.