Speech Enhancement Using Deep Mixture of Experts Based on Hard Expectation Maximization

Pavan Karjol, Prasanta Kumar Ghosh


We consider the problem of deep mixture of experts based speech enhancement. The deep mixture of experts, where experts are considered as deep neural network (DNN), is difficult to train due to the network structure. In this work, we propose a pre-training method for individual DNN in deep mixture of experts. We use hard expectation maximization (EM) to pre-train the individual DNNs. After pre-training, we take a weighted combination of outputs of individual DNN experts and jointly train the whole system. We compare the proposed method with single DNN based speech enhancement scheme. Speech enhancement experiments, in four SNR conditions, show the superiority of proposed method over the baseline scheme. The average improvements obtained for four seen noise cases over single DNN scheme are 0.08, 0.59 dB and 0.015 in terms of objective measures viz perceptual evaluation of speech quality (PESQ), segmental signal to noise ratio (seg SNR) and short time objective intelligibility (STOI) respectively.


 DOI: 10.21437/Interspeech.2018-1730

Cite as: Karjol, P., Ghosh, P.K. (2018) Speech Enhancement Using Deep Mixture of Experts Based on Hard Expectation Maximization. Proc. Interspeech 2018, 3254-3258, DOI: 10.21437/Interspeech.2018-1730.


@inproceedings{Karjol2018,
  author={Pavan Karjol and Prasanta Kumar Ghosh},
  title={Speech Enhancement Using Deep Mixture of Experts Based on Hard Expectation Maximization},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={3254--3258},
  doi={10.21437/Interspeech.2018-1730},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1730}
}