ISCA Archive ICSLP 1994
ISCA Archive ICSLP 1994

Preserving naturalness in synthetic voices while minimizing variation in formant frequencies and bandwidths

Niels-Jorn Dyhr, Marianne Elmlund, Carsten Henriksen

As a preliminary to improving the naturalness of the synthetic male and female voices in a Danish text-to-speech system using a rule-driven formant synthesizer, the relative importance of the individual formant frequencies and bandwidths has been investigated. Recordings of a Danish compound word consisting entirely of voiced segments were analyzed. Based on these recordings and the analysis, a number of manipulated, synthetic stimuli were created and presented in two listening tests. The main results of these simplifications are: a) Bandwidths (B5-B8) are more sensitive to simplifications than formants (F5-F8). b) F5-F8 may be held constant throughout the utterance, and B1-B4 may be kept constant per segment without perceptible loss of naturalness, c) B5-B8 may also be held constant, though with a minor loss of naturalness. A similar approach has been tried with female synthetic voices, and preliminary results corroborate the results outlined above. Among the more comprehensive simplifications in the male voice a hierarchy of acceptability was established.


Cite as: Dyhr, N.-J., Elmlund, M., Henriksen, C. (1994) Preserving naturalness in synthetic voices while minimizing variation in formant frequencies and bandwidths. Proc. 3rd International Conference on Spoken Language Processing (ICSLP 1994), 751-754

@inproceedings{dyhr94_icslp,
  author={Niels-Jorn Dyhr and Marianne Elmlund and Carsten Henriksen},
  title={{Preserving naturalness in synthetic voices while minimizing variation in formant frequencies and bandwidths}},
  year=1994,
  booktitle={Proc. 3rd International Conference on Spoken Language Processing (ICSLP 1994)},
  pages={751--754}
}