Speech Intelligibility Enhancement Based on a Non-causal Wavenet-like Model

Muhammed Shifas PV, Vassilis Tsiaras, Yannis Stylianou


Low speech intelligibility in noisy listening conditions makes more difficult our communication with others. Various strategies have been suggested to modify a speech signal before it is presented in a noisy listening environment with the goal to increase its intelligibility. A state-of-the art approach, referred to as Spectral Shaping and Dynamic Range Compression (SSDRC), relies on modifying spectral and temporal structure of the clean speech and has been shown to considerably improve the intelligibility of speech in noisy listening conditions. In this paper, we present a non-causal Wavenet-like model for mapping clean speech samples to samples generated by SSDRC. A successful non-linear mapping function has the potential to be used a) in improving the intelligibility of noisy speech and b) in the Wavenet-based speech synthesizers as a model based intelligibility improvement layer. Objective and subjective results show that the Wavenet-based mapping function is able to reproduce the intelligibility gains of SSDRC, while by far it improves the quality of the modified signal compared to the quality obtained by SSDRC.


 DOI: 10.21437/Interspeech.2018-2119

Cite as: PV, M.S., Tsiaras, V., Stylianou, Y. (2018) Speech Intelligibility Enhancement Based on a Non-causal Wavenet-like Model. Proc. Interspeech 2018, 1868-1872, DOI: 10.21437/Interspeech.2018-2119.


@inproceedings{PV2018,
  author={Muhammed Shifas PV and Vassilis Tsiaras and Yannis Stylianou},
  title={Speech Intelligibility Enhancement Based on a Non-causal Wavenet-like Model},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={1868--1872},
  doi={10.21437/Interspeech.2018-2119},
  url={http://dx.doi.org/10.21437/Interspeech.2018-2119}
}