8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


An Investigation of Intensity Patterns for German

Oliver Jokisch, Marco Kuhne

Dresden University of Technology, Germany

The perceived quality of synthetic speech strongly depends on its prosodic naturalness. Concerning the control of duration and fundamental frequency in a speech synthesis system, sophisticated models have been developed during the last decade. Speech intensity modeling is often considered as algorithmically and perceptually less important. Departing from a syllable-based, trainable prosody model the authors tested new factors of influence to improve the predicted intensity contour on phonemic level. Therefore, a German newsreader corpus has been analyzed with respect to typical intensity patterns. The f0-intensity interaction has the most significant influence and was perceptually evaluated by 32 listeners ranking 20 different stimuli. Using an elementary, linear intensity model, modified natural speech only slightly degrades about 0.3 at the ITU-T conform MOS scale.

Full Paper

Bibliographic reference.  Jokisch, Oliver / Kuhne, Marco (2003): "An investigation of intensity patterns for German", In EUROSPEECH-2003, 165-168.