Whispered Speech to Neutral Speech Conversion Using Bidirectional LSTMs

G. Nisha Meenakshi, Prasanta Kumar Ghosh


We propose a bidirectional long short-term memory (BLSTM) based whispered speech to neutral speech conversion system that employs the STRAIGHT speech synthesizer. We use a BLSTM to map the spectral features of whispered speech to those of neutral speech. Three other BLSTMs are employed to predict the pitch, periodicity levels and the voiced/unvoiced phoneme decisions from the spectral features of whispered speech. We use objective measures to quantify the quality of the predicted spectral features and excitation parameters, using data recorded from six subjects, in a four fold setup. We find that the temporal smoothness of the spectral features predicted using the proposed BLSTM based system is statistically more compared to that predicted using deep neural network based baseline schemes. We also observe that while the performance of the proposed system is comparable to the baseline scheme for pitch prediction, it is superior in terms of classifying voicing decisions and predicting periodicity levels. From subjective evaluation via listening test, we find that the proposed method is chosen as the best performing scheme 26.61% (absolute) more often than the best baseline scheme. This reveals that the proposed method yields a more natural sounding neutral speech from whispered speech.


 DOI: 10.21437/Interspeech.2018-1487

Cite as: Meenakshi, G.N., Ghosh, P.K. (2018) Whispered Speech to Neutral Speech Conversion Using Bidirectional LSTMs. Proc. Interspeech 2018, 491-495, DOI: 10.21437/Interspeech.2018-1487.


@inproceedings{Meenakshi2018,
  author={G. Nisha Meenakshi and Prasanta Kumar Ghosh},
  title={Whispered Speech to Neutral Speech Conversion Using Bidirectional LSTMs},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={491--495},
  doi={10.21437/Interspeech.2018-1487},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1487}
}