Quality Degradation Diagnosis for Voice Networks — Estimating the Perceived Noisiness, Coloration, and Discontinuity of Transmitted Speech

Gabriel Mittag, Sebastian Möller


We present a single-ended quality diagnosis model for super-wideband speech communication networks, which predicts the perceived Noisiness, Coloration, and Discontinuity of transmitted speech. The model is an extension to the single-ended speech quality prediction model NISQA and can additionally indicate the cause of quality degradation. Service providers can use the model independently of the communication system’s technology since it is based on universal perceptual quality dimensions. The prediction model consists of a convolutional neural network that firstly calculates per-frame features of a speech signal and subsequently aggregates the features over time with a recurrent neural network, to estimate the speech quality dimensions. The proposed diagnosis model achieves promising results with an average RMSE* of 0.24.


 DOI: 10.21437/Interspeech.2019-2636

Cite as: Mittag, G., Möller, S. (2019) Quality Degradation Diagnosis for Voice Networks — Estimating the Perceived Noisiness, Coloration, and Discontinuity of Transmitted Speech. Proc. Interspeech 2019, 3426-3430, DOI: 10.21437/Interspeech.2019-2636.


@inproceedings{Mittag2019,
  author={Gabriel Mittag and Sebastian Möller},
  title={{Quality Degradation Diagnosis for Voice Networks — Estimating the Perceived Noisiness, Coloration, and Discontinuity of Transmitted Speech}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={3426--3430},
  doi={10.21437/Interspeech.2019-2636},
  url={http://dx.doi.org/10.21437/Interspeech.2019-2636}
}