7th International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA 2011)
The framework of the presentation is the assessment of the ability of human raters or speech-processing software to detect glottal cycles in speech sounds and measure their lengths in synthetic breathy and rough voices. The synthesis of hoarse voices designates the generation of speech sounds the timbre of which simulates the voice quality of dysphonic speakers. The added value of synthetically generated test stimuli is that the user may fix and know their properties exactly. The corpus comprises synthetic vowels [a] combining seven levels of frequency jitter and three levels of additive noise. The presentation is focused on the simulation of rough and breathy voices via frequency modulation of the glottal excitation model and addition of pulsatile noise at the glottis. Furthermore, the genuine glottal cycle lengths and glottal source to noise ratios are obtained to which lengths and ratios inferred via signal processing may be compared. The glottal cycle lengths are acquired by tracking the phase of the harmonic driving functions of the speech sound synthesizer. Actual glottal signal-to-noise ratios are measured by summing separately over the sound stimuli the squared clean volume velocity and pulsatile noise samples.
Index Terms. speech synthesis, breathiness, roughness, frequency jitter, amplitude shimmy, and additive glottal noise
Full Paper (reprinted with permission from Firenze University Press)
Bibliographic reference. Fraj, S. Ben Elhadj / Grenez, Francis / Schoentgen, Jean (2011): "Synthesis of breathy and rough voices with a view to validating perceptual and automatic glottal cycle pattern recognition", In MAVEBA-2011, 135-138.