Binaural Speech Intelligibility Estimation Using Deep Neural Networks

Kazuhiro Kondo, Kazuya Taira, Yosuke Kobayashi


We attempted to estimate the speech intelligibility of binaural speech signal with additive noise. The assumption here was that both the target speech signal and the noise source are directional sources. In this case, when the speech and noise sources are located away from each other, the intelligibility generally improves since the human auditory system can potentially segregate these two sources. However since intelligibility tests are commonly conducted using monaurally recorded signals, the intelligibility is often underestimated compared to live human listeners since this segregation capability is neglected. We have previously proposed to use binaurally recorded signals to estimate the speech intelligibility and compared the estimation accuracy of several machine learning methods on this signal. We showed that random forests (RF) combined with the better ear model and Mel filter banks gives the highest accuracy compared to other methods, such as the support vector machines or logistic regression. In this paper, we attempt to introduce deep neural networks (DNN) to this task. Initial evaluation results show that the use of DNN can provide a modest improvement over RF.


 DOI: 10.21437/Interspeech.2018-27

Cite as: Kondo, K., Taira, K., Kobayashi, Y. (2018) Binaural Speech Intelligibility Estimation Using Deep Neural Networks. Proc. Interspeech 2018, 1858-1862, DOI: 10.21437/Interspeech.2018-27.


@inproceedings{Kondo2018,
  author={Kazuhiro Kondo and Kazuya Taira and Yosuke Kobayashi},
  title={Binaural Speech Intelligibility Estimation Using Deep Neural Networks},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={1858--1862},
  doi={10.21437/Interspeech.2018-27},
  url={http://dx.doi.org/10.21437/Interspeech.2018-27}
}