Third International Conference on Spoken Language Processing (ICSLP 94)

Yokohama, Japan
September 18-22, 1994

Preserving Naturalness in Synthetic Voices While Minimizing Variation in Formant Frequencies and Bandwidths

Niels-Jorn Dyhr, Marianne Elmlund, Carsten Henriksen

Tele Danmark Research, Horsholm, Denmark

As a preliminary to improving the naturalness of the synthetic male and female voices in a Danish text-to-speech system using a rule-driven formant synthesizer, the relative importance of the individual formant frequencies and bandwidths has been investigated. Recordings of a Danish compound word consisting entirely of voiced segments were analyzed. Based on these recordings and the analysis, a number of manipulated, synthetic stimuli were created and presented in two listening tests. The main results of these simplifications are: a) Bandwidths (B5-B8) are more sensitive to simplifications than formants (F5-F8). b) F5-F8 may be held constant throughout the utterance, and B1-B4 may be kept constant per segment without perceptible loss of naturalness, c) B5-B8 may also be held constant, though with a minor loss of naturalness. A similar approach has been tried with female synthetic voices, and preliminary results corroborate the results outlined above. Among the more comprehensive simplifications in the male voice a hierarchy of acceptability was established.

Full Paper

Bibliographic reference.  Dyhr, Niels-Jorn / Elmlund, Marianne / Henriksen, Carsten (1994): "Preserving naturalness in synthetic voices while minimizing variation in formant frequencies and bandwidths", In ICSLP-1994, 751-754.