16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

Layered Nonnegative Matrix Factorization for Speech Separation

Chung-Chien Hsu, Jen-Tzung Chien, Tai-Shih Chi

National Chiao Tung University, Taiwan

This paper proposes a layered nonnegative matrix factorization (L-NMF) algorithm for speech separation. The standard NMF method extracts parts-based bases out of nonnegative training data and is often used to separate mixed spectrograms. The proposed L-NMF algorithm comprises of several layers of standard NMF blocks. During training, each layer of the L-NMF is initialized separately and then fine-tuned by minimizing the propagated reconstruction error. More complicated bases of the training data are emerged in deeper layers of the L-NMF by progressively combining parts-based bases extracted in the first layer. In other words, these complicated bases contain collective information of the parts-based bases. The bases deciphered by all layers are then used to separate spectrograms in the conventional NMF way. Simulation results show the proposed L-NMF outperforms the standard NMF in terms of the source-to-distortion ratio (SDR).

Full Paper

Bibliographic reference.  Hsu, Chung-Chien / Chien, Jen-Tzung / Chi, Tai-Shih (2015): "Layered nonnegative matrix factorization for speech separation", In INTERSPEECH-2015, 628-632.