INTERSPEECH 2012
13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Intelligibility Classification of Pathological Speech Using Fusion of Multiple High Level Descriptors

Jangwon Kim, Naveen Kumar, Andreas Tsiartas, Ming Li, Shrikanth Narayanan

Signal Analysis and Interpretation Lab. (SAIL), University of Southern California, Los Angeles, USA

Pathological speech usually refers to the condition of speech distortion resulting from atypicalities in voice and/or in the articulatory mechanisms owing to disease, illness or other physical or biological insult to the production system. While automatic evaluation of speech intelligibility and quality could come in handy in these scenarios to assist in diagnosis and treatment design, the many sources and types of variability often make it a very challenging computational processing problem. In this work we design multiple subsystems to address different aspects of pathological speech characteristics. These subsystems are then fused at the binary hard score level (intelligible or not intelligible) using Bayesian networks. Results show that subsystems, such as multiple language phoneme probability system, prosodic and intonational subsystem, and voice quality and pronunciation subsystem, have discriminating power for intelligibility (9.8%, 17.1%, 14.6% higher than by-chance respectively). Noisy-Majority based fusion shows 66.4% accuracy, but the performance improvement by fusion is not made. Also, voice clustering based joint classification is applied to minimize misclassification of the best subsystem, and it shows the best classification accuracy (79.9% on dev set, 76.8% on test set).

Index Terms: pathological speech, intelligibility of speech, fusion of multiple subsystems

Full Paper

Bibliographic reference.  Kim, Jangwon / Kumar, Naveen / Tsiartas, Andreas / Li, Ming / Narayanan, Shrikanth (2012): "Intelligibility classification of pathological speech using fusion of multiple high level descriptors", In INTERSPEECH-2012, 534-537.