Bidirectional Long-Short Term Memory Network-based Estimation of Reliable Spectral Component Locations

Aaron Nicolson, Kuldip K. Paliwal


An accurate Ideal Binary Mask (IBM) estimate is essential for Missing Feature Theory (MFT)-based speaker identification, as incorrectly labelled spectral components (where a component is either reliable or unreliable) will degrade the performance of an Automatic Speaker Identification (ASI) system adversely in the presence of noise. In this work a Bidirectional Recurrent Neural Network (BRNN) with Long-Short Term Memory (LSTM) cells is proposed for improved IBM estimation. The proposed system had an average IBM estimate accuracy improvement of 4.5% and an average MFT-based speaker identification accuracy improvement of 3.1% over all tested SNR dB levels, when compared to the previously proposed Multilayer Perceptron (MLP)-IBM estimator. When used for speech enhancement the proposed system had an average MOS-LQO (objective quality measure) improvement of 0.32 and an average QSTI (objective intelligibility measure) improvement of 0.01 over all tested SNR dB levels, when compared to the MLP-IBM estimator. The results presented in this work highlight the effectiveness of the proposed BRNN-IBM estimator for MFT-based speaker identification and IBM-based speech enhancement.


 DOI: 10.21437/Interspeech.2018-1134

Cite as: Nicolson, A., Paliwal, K.K. (2018) Bidirectional Long-Short Term Memory Network-based Estimation of Reliable Spectral Component Locations. Proc. Interspeech 2018, 1606-1610, DOI: 10.21437/Interspeech.2018-1134.


@inproceedings{Nicolson2018,
  author={Aaron Nicolson and Kuldip K. Paliwal},
  title={Bidirectional Long-Short Term Memory Network-based Estimation of Reliable Spectral Component Locations},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={1606--1610},
  doi={10.21437/Interspeech.2018-1134},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1134}
}