7th International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA 2011)

Florence, Italy
August 25-27, 2011

Cepstral Analysis of Perceptually Rated Synthetic Disordered Speech Stimuli

A. Alpan (1), Francis Grenez (1), Jean Schoentgen (1,2)

(1) Laboratoires d'Images, Signaux et Dispositifs de Télécommunications, Université Libre de Bruxelles, Brussels, Belgium
(2) National Fund for Scientific Research, Belgium

A number of studies have shown that the amplitude of the first rahmonic peak (R1) in the cepstrum may indicate hoarse voice quality. The cepstrum is obtained by taking the inverse Fourier transform of the log-magnitude spectrum. The goal of the article is to apply cepstral analysis to a perceptually evaluated corpus of synthetic stimuli to learn about the link between the signal properties (fixed by the synthesizer parameters) and the first rahmonic peak. The synthetic stimuli have been generated by a synthesizer of disordered voices that has been shown to generate natural-sounding speech fragments comprising different vocal perturbations. A second objective is to examine the link between first rahmonic peak and perceived breathiness and roughness, link which has not been studied previously. The speech stimuli have been perceptually assessed by nine listeners according to grade, breathiness and roughness. A number of cepstral analysis alternatives have been implemented, including period-synchronous temporal frames and harmonic-synchronous band-limited analyses.

Index Terms. cepstral analysis, synthetic disordered speech, first rahmonic amplitude

Full Paper (reprinted with permission from Firenze University Press)

Bibliographic reference.  Alpan, A. / Grenez, Francis / Schoentgen, Jean (2011): "Cepstral analysis of perceptually rated synthetic disordered speech stimuli", In MAVEBA-2011, 131-134.