On the Use of Pitch Features for Disordered Speech Recognition

Shansong Liu, Shoukang Hu, Xunying Liu, Helen Meng

Pitch features have long been known to be useful for recognition of normal speech. However, for disordered speech, the significant degradation of voice quality renders the prosodic features, such as pitch, not always useful, particularly when the underlying conditions, for example, damages to the cerebellum, introduce a large effect on prosody control. Hence, both acoustic and prosodic information can be distorted. To the best of our knowledge, there has been very limited research on using pitch features for disordered speech recognition. In this paper, a comparative study of multiple approaches designed to incorporate pitch features is conducted to improve the performance of two disordered speech recognition tasks: English UASpeech, and Cantonese CUDYS. A novel gated neural network (GNN) based approach is used to improve acoustic and pitch feature integration over a conventional concatenation between the two. Bayesian estimation of GNNs is also investigated to further improve their robustness.

 DOI: 10.21437/Interspeech.2019-2609

Cite as: Liu, S., Hu, S., Liu, X., Meng, H. (2019) On the Use of Pitch Features for Disordered Speech Recognition. Proc. Interspeech 2019, 4130-4134, DOI: 10.21437/Interspeech.2019-2609.

  author={Shansong Liu and Shoukang Hu and Xunying Liu and Helen Meng},
  title={{On the Use of Pitch Features for Disordered Speech Recognition}},
  booktitle={Proc. Interspeech 2019},