Jointly Optimizing Activation Coefficients of Convolutive NMF Using DNN for Speech Separation

Hao Li, Shuai Nie, Xueliang Zhang, Hui Zhang


Convolutive non-negative matrix factorization (CNMF) and deep neural networks (DNN) are two efficient methods for monaural speech separation. Conventional DNN focuses on building the non-linear relationship between mixture and target speech. However, it ignores the prominent structure of the target speech. Conventional CNMF model concentrates on capturing prominent harmonic structures and temporal continuities of speech but it ignores the non-linear relationship between the mixture and target. Taking these two aspects into consideration at the same time may result in better performance. In this paper, we propose a joint optimization of DNN models with an extra CNMF layer for speech separation task. We also utilize an extra masking layer on the proposed model to constrain the speech reconstruction. Moreover, a discriminative training criterion is proposed to further enhance the performance of the separation. Experimental results show that the proposed model has significant improvement in PESQ, SAR, SIR and SDR compared with conventional methods.


DOI: 10.21437/Interspeech.2016-120

Cite as

Li, H., Nie, S., Zhang, X., Zhang, H. (2016) Jointly Optimizing Activation Coefficients of Convolutive NMF Using DNN for Speech Separation. Proc. Interspeech 2016, 550-554.

Bibtex
@inproceedings{Li+2016,
author={Hao Li and Shuai Nie and Xueliang Zhang and Hui Zhang},
title={Jointly Optimizing Activation Coefficients of Convolutive NMF Using DNN for Speech Separation},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-120},
url={http://dx.doi.org/10.21437/Interspeech.2016-120},
pages={550--554}
}