15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Single Channel Source Separation with General Stochastic Networks

Matthias Zöhrer, Franz Pernkopf

Technische Universität Graz, Austria

Single channel source separation (SCSS) is ill-posed and thus challenging. In this paper, we apply general stochastic networks (GSNs) — a deep neural network architecture — to SCSS. We extend GSNs to be capable of predicting a time-frequency representation, i.e. softmask by introducing a hybrid generative-discriminative training objective to the network. We evaluate GSNs on data of the 2nd CHiME speech separation challenge. In particular, we provide results for a speaker dependent, a speaker independent, a matched noise condition and an unmatched noise condition task. Empirically, we compare to other deep architectures, namely a deep belief network (DBN) and a multi-layer perceptron (MLP). In general, deep architectures perform well on SCSS tasks.

Full Paper

Bibliographic reference.  Zöhrer, Matthias / Pernkopf, Franz (2014): "Single channel source separation with general stochastic networks", In INTERSPEECH-2014, 978-982.