LSTBM: A Novel Sequence Representation of Speech Spectra Using Restricted Boltzmann Machine with Long Short-Term Memory

Toru Nakashika


In this paper, we propose a novel probabilistic model, namely long short-term Boltzmann memory (LSTBM), to represent sequential data like speech spectra. The LSTBM is an extension of a restricted Boltzmann machine (RBM) that has generative long short-term memory (LSTM) units. The original RBM automatically learns relationships between visible and hidden units and is widely used as a feature extractor, a generator, a classifier, a pre-training method of deep neural networks, etc. However, the RBM is not sufficient to represent sequential data because it assumes that each frame from sequential data is completely independent of the others. Unlike conventional RBMs, the LSTBM has connections over time via LSTM units and represents time dependencies in sequential data. Our speech coding experiments demonstrated that the proposed LSTBM outperformed the other conventional methods: an RBM and a temporal RBM.


 DOI: 10.21437/Interspeech.2018-1753

Cite as: Nakashika, T. (2018) LSTBM: A Novel Sequence Representation of Speech Spectra Using Restricted Boltzmann Machine with Long Short-Term Memory. Proc. Interspeech 2018, 2529-2533, DOI: 10.21437/Interspeech.2018-1753.


@inproceedings{Nakashika2018,
  author={Toru Nakashika},
  title={LSTBM: A Novel Sequence Representation of Speech Spectra Using Restricted Boltzmann Machine with Long Short-Term Memory},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={2529--2533},
  doi={10.21437/Interspeech.2018-1753},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1753}
}