A Shifted Delta Coefficient Objective for Monaural Speech Separation Using Multi-task Learning

Chenglin Xu, Wei Rao, Eng Siong Chng, Haizhou Li


This paper addresses the problem of monaural speech separation for simultaneous speakers. Recent studies such as uPIT, cuPIT-Grid LSTM and their variants have advanced the state-of-the-art separation models. Delta and acceleration coefficients are typically used in the objective function to capture short time dynamics. We consider that such coefficients don't benefit from the temporal information over a long range such as phoneme and syllable. In this paper, we propose a shifted delta coefficient (SDC) objective to explore the temporal information over a long range of the spectral dynamics. The SDC ensures the temporal continuity of output frames within the same speaker. In addition, we propose a novel multi-task learning framework, that we call SDC-MTL, by extending the SDC objective with a subtask of predicting the time-frequency labels ({silence, single, overlapped}) of the mixture. The experimental results show 11.7% and 3.9% relative improvements on WSJ0-2mix dataset under open conditions over the uPIT and cuPIT-Grid LSTM baselines. A further analysis shows 17.8% and 6.2% relative improvements with speakers of same gender.


 DOI: 10.21437/Interspeech.2018-1150

Cite as: Xu, C., Rao, W., Chng, E.S., Li, H. (2018) A Shifted Delta Coefficient Objective for Monaural Speech Separation Using Multi-task Learning. Proc. Interspeech 2018, 3479-3483, DOI: 10.21437/Interspeech.2018-1150.


@inproceedings{Xu2018,
  author={Chenglin Xu and Wei Rao and Eng Siong Chng and Haizhou Li},
  title={A Shifted Delta Coefficient Objective for Monaural Speech Separation Using Multi-task Learning},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={3479--3483},
  doi={10.21437/Interspeech.2018-1150},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1150}
}