9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Multi-Band and Multi-Cue Analyses of Disordered Connected Speech

A. Alpan (1), Y. Maryn (2), F. Grenez (1), A. Kacha (1), J. Schoentgen (1)

(1) Université Libre de Bruxelles, Belgium; (2) Sint-Jan General Hospital, Belgium

The objective is to analyze vocal dysperiodicities in connected speech produced by dysphonic speakers. The analysis involves a speech variogram-based method that enables tracking instantaneous vocal dysperiodicities. The dysperiodicity trace is summarized by means of the signal-to-dysperiodicity ratio, which has been shown to correlate strongly with the perceived degree of hoarseness of the speaker. Previously, this method has been evaluated on small corpora. In the study that is reported here the corpus has comprised 28 normophonic and 223 dysphonic speakers. This has enabled carrying out the analysis in multiple frequency bands and submitting the signal-to-dysperiodicity ratios per band to multi-variable linear regression analysis with a view to predicting the perceptual ratings of the disordered speech fragments. The analysis results are compared to the cepstral peak prominence, which is a cue that indirectly summarizes vocal dysperiodicities frame-wise via the size of the first rhamonic of the speech cepstrum. Results show that the signal-to-dysperiodicity ratios obtained for low-frequency bands up to 1500 Hz contribute most to the prediction of the perceptual scores. Also, combining the cepstral peak prominence with the low frequency-band signalto- dysperiodicity ratio increases their common correlation with perceptual scores to 0.8.

