ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Speech spectrum restoration based on conditional restricted boltzmann machine

Xugang Lu, Shigeki Matsuda, Chiori Hori

Many speech enhancement algorithms have been proposed for speech restoration from distorted speech. However, if some components of the signal are completely missed or distorted, there is no way for those algorithms to restore the clean speech. Considering that the restricted Boltzmann machine (RBM) is a stochastic version of the Hopfield network which can be used as an associative memory, we propose to use its "recall" ability for speech spectrum restoration when some parts of the speech spectrum are completely missed or distorted. Traditionally, in training the RBM, speech spectral patches are randomly selected as input. There is no consideration of the temporal correlation between different input spectral patches. In this study, we further propose to model this temporal correlation by using a conditional RBM (CRBM). The inference on the CRBM is almost the same as that of on the RBM by only modifying the biases as conditional dynamic biases. We did experiments for clean speech reconstruction and distorted speech restoration based on the trained models. Our experimental results showed that both the RBM and CRBM worked well in restoration task. By incorporating temporal correlation in the CRBM, a further improvement on reconstruction and restoration accuracy was achieved.

doi: 10.21437/Interspeech.2013-722

Cite as: Lu, X., Matsuda, S., Hori, C. (2013) Speech spectrum restoration based on conditional restricted boltzmann machine. Proc. Interspeech 2013, 3259-3263, doi: 10.21437/Interspeech.2013-722

  author={Xugang Lu and Shigeki Matsuda and Chiori Hori},
  title={{Speech spectrum restoration based on conditional restricted boltzmann machine}},
  booktitle={Proc. Interspeech 2013},