ISCA Archive Interspeech 2016
ISCA Archive Interspeech 2016

Deep Neural Networks for Voice Quality Assessment Based on the GRBAS Scale

Simin Xie, Nan Yan, Ping Yu, Manwa L. Ng, Lan Wang, Zhuanzhuan Ji

In the field of voice therapy, perceptual evaluation is widely used by expert listeners as a way to evaluate pathological and normal voice quality. This approach is understandably subjective as it is subject to listeners’ bias which high inter- and intra-listeners variability can be found. As such, research on automatic assessment of pathological voices using a combination of subjective and objective analyses emerged. The present study aimed to develop a complementary automatic assessment system for voice quality based on the well-known GRBAS scale by using a battery of multidimensional acoustical measures through Deep Neural Networks. A total of 44 dimensionality parameters including Mel-frequency Cepstral Coefficients, Smoothed Cepstral Peak Prominence and Long-Term Average Spectrum was adopted. In addition, the state-of-the-art automatic assessment system based on Modulation Spectrum (MS) features and GMM classifiers was used as comparison system. The classification results using the proposed method revealed a moderate correlation with subjective GRBAS scores of dysphonic severity, and yielded a better performance than MS-GMM system, with the best accuracy around 81.53%. The findings indicate that such assessment system can be used as an appropriate evaluation tool in determining the presence and severity of voice disorders.

doi: 10.21437/Interspeech.2016-986

Cite as: Xie, S., Yan, N., Yu, P., Ng, M.L., Wang, L., Ji, Z. (2016) Deep Neural Networks for Voice Quality Assessment Based on the GRBAS Scale. Proc. Interspeech 2016, 2656-2660, doi: 10.21437/Interspeech.2016-986

  author={Simin Xie and Nan Yan and Ping Yu and Manwa L. Ng and Lan Wang and Zhuanzhuan Ji},
  title={{Deep Neural Networks for Voice Quality Assessment Based on the GRBAS Scale}},
  booktitle={Proc. Interspeech 2016},