Perceptual Optimization of an Enhanced Geometric Vocal Fold Model for Articulatory Speech Synthesis

Peter Birkholz, Susanne Drechsel, Simon Stone


We present a geometric vocal fold model that describes the glottal area between the lower and upper vocal fold edges as a function of time. It is based on a glottis model by Titze [J. Acoust. Soc. Am., 75(2), 570–580 (1984)] and has been enhanced to allow the generation of skewed (asymmetric) glottal area waveforms and diplophonic double pulsing. Embedded in the articulatory speech synthesizer VocalTractLab, the model was used for the synthesis of German words with a range of settings for the vocal fold model parameters to generate different male and female voices. A perception experiment was conducted to determine the parameter settings that generate the most natural-sounding voices. The most natural-sounding male voice was generated with a slightly divergent prephonatory glottal shape, with a phase delay of 70° between the lower and upper vocal fold edges, symmetric glottal area pulses, and a little shimmer (double pulsing). The most natural-sounding female voice was generated with a straight prephonatory glottal channel, with a phase delay of 50° between the vocal fold edges, slightly asymmetric glottal area pulses, and a little shimmer.


 DOI: 10.21437/Interspeech.2019-2410

Cite as: Birkholz, P., Drechsel, S., Stone, S. (2019) Perceptual Optimization of an Enhanced Geometric Vocal Fold Model for Articulatory Speech Synthesis. Proc. Interspeech 2019, 3765-3769, DOI: 10.21437/Interspeech.2019-2410.


@inproceedings{Birkholz2019,
  author={Peter Birkholz and Susanne Drechsel and Simon Stone},
  title={{Perceptual Optimization of an Enhanced Geometric Vocal Fold Model for Articulatory Speech Synthesis}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={3765--3769},
  doi={10.21437/Interspeech.2019-2410},
  url={http://dx.doi.org/10.21437/Interspeech.2019-2410}
}