Many glottal source models have been proposed, but none has been systematically validated perceptually. Our previous work showed that model fitting of the negative peak of the flow derivative is the most important predictor of perceptual similarity to the target voice. In this study, a new voice source model is proposed to capture perceptually-important source shape aspects. This new model, along with four other source models, was fitted to 40 voice sources (20 male and 20 female) obtained by inverse filtering and analysis-by-synthesis (AbS) of samples of natural speech. We generated synthetic copies of the voices using each modeled source pulse, with all other synthesis parameters held constant, and then conducted a visual sort-and-rate task in which listeners assessed the extent of perceived similarity between the target voice samples and each copy. Results showed that the proposed model provided a more accurate fit and a better perceptual match to the target than did the other models.
Bibliographic reference. Chen, Gang / Garellek, Marc / Kreiman, Jody / Gerratt, Bruce R. / Alwan, Abeer (2013): "A perceptually and physiologically motivated voice source model", In INTERSPEECH-2013, 2001-2005.