Effectiveness of Generative Adversarial Network for Non-Audible Murmur-to-Whisper Speech Conversion

Neil Shah, Nirmesh Shah, Hemant Patil


The murmur produced by the speaker and captured by the NonAudible Murmur (NAM)-one of the Silent Speech Interface (SSI) technique, suffers from the speech quality degradation. This is due to the lack of radiation effect at the lips and lowpass nature of the soft tissue, which attenuates the high frequency related information. In this work, a novel method for NAM-toWhisper (NAM2WHSP) speech conversion incorporating Generative Adversarial Network (GAN) is proposed. The GAN minimizes the distributional divergence between the whispered speech and the generated speech parameters (through adversarial optimization). The objective and subjective evaluation performed on the proposed system, justifies the ability of adversarial optimization over Maximum Likelihood (ML)-based optimization networks, such as a Deep Neural Network (DNN), in preserving and improving the speech quality and intelligibility. The adversarial optimization learns the mapping function with 54.2% relative improvement in MOS and 29.83% absolute reduction in % WER w.r.t. the state-of-the-art mapping techniques. Furthermore, we evaluated the proposed framework by analyzing the level of contextual information and the number of training utterances required for optimizing the network parameters, for the given task and database.


 DOI: 10.21437/Interspeech.2018-1565

Cite as: Shah, N., Shah, N., Patil, H. (2018) Effectiveness of Generative Adversarial Network for Non-Audible Murmur-to-Whisper Speech Conversion. Proc. Interspeech 2018, 3157-3161, DOI: 10.21437/Interspeech.2018-1565.


@inproceedings{Shah2018,
  author={Neil Shah and Nirmesh Shah and Hemant Patil},
  title={Effectiveness of Generative Adversarial Network for Non-Audible Murmur-to-Whisper Speech Conversion},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={3157--3161},
  doi={10.21437/Interspeech.2018-1565},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1565}
}