Most previous studies on acoustic assessment of disordered voice were focused on extracting perturbation features from isolated vowels produced with steady-state phonation. Natural speech, however, is considered to be more preferable in the aspects of flexibility, effectiveness and reliability for clinical practice. This paper presents an investigation on applying automatic speech recognition (ASR) technology to disordered voice assessment of Cantonese speakers. A DNN-based ASR system is trained using phonetically-rich continuous utterances from normal speakers. It was found that frame-level phone posteriors obtained from the ASR system are strongly correlated with the severity level of voice disorder. Phone posteriors in utterances with severe disorder exhibit significantly larger variation than those with mild disorder. A set of utterance-level posterior features are computed to quantify such variation for pattern recognition purpose. An SVM based classifier is used to classify an input utterance into the categories of mild, moderate and severe disorder. The two-class classification accuracy for mild and severe disorders is 90.3%, and significant confusion between mild and moderate disorders is observed. For some of the subjects with severe voice disorder, the classification results are highly inconsistent among individual utterances. Furthermore, short utterances tend to have more classification errors.
Cite as: Liu, Y., Lee, T., Ching, P.C., Law, T.K.T., Lee, K.Y.S. (2017) Acoustic Assessment of Disordered Voice with Continuous Speech Based on Utterance-Level ASR Posterior Features. Proc. Interspeech 2017, 2680-2684, doi: 10.21437/Interspeech.2017-280
@inproceedings{liu17d_interspeech, author={Yuanyuan Liu and Tan Lee and P.C. Ching and Thomas K.T. Law and Kathy Y.S. Lee}, title={{Acoustic Assessment of Disordered Voice with Continuous Speech Based on Utterance-Level ASR Posterior Features}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={2680--2684}, doi={10.21437/Interspeech.2017-280} }