We describe an approach to simulate different phonation types, following John Lavers terminology, by means of a hybrid (rulebased and unit concatenating) formant synthesizer. Different voice qualities were generated by following hints from the literature and applying the revised KLGLOTT88 model. Within a listener perception experiment, we show that the phonation types get distinguished by the listeners and lead to emotional impression as predicted by literature. The synthesis system and its source code, as well as audio samples can be downloaded at http://emoSyn.syntheticspeech.de/.
Cite as: Burkhardt, F. (2009) Rule-based voice quality variation with formant synthesis. Proc. Interspeech 2009, 2659-2662, doi: 10.21437/Interspeech.2009-499
@inproceedings{burkhardt09_interspeech, author={Felix Burkhardt}, title={{Rule-based voice quality variation with formant synthesis}}, year=2009, booktitle={Proc. Interspeech 2009}, pages={2659--2662}, doi={10.21437/Interspeech.2009-499} }