EUROSPEECH 2003 - INTERSPEECH 2003
The perceived quality of synthetic speech strongly depends on its prosodic naturalness. Concerning the control of duration and fundamental frequency in a speech synthesis system, sophisticated models have been developed during the last decade. Speech intensity modeling is often considered as algorithmically and perceptually less important. Departing from a syllable-based, trainable prosody model the authors tested new factors of influence to improve the predicted intensity contour on phonemic level. Therefore, a German newsreader corpus has been analyzed with respect to typical intensity patterns. The f0-intensity interaction has the most significant influence and was perceptually evaluated by 32 listeners ranking 20 different stimuli. Using an elementary, linear intensity model, modified natural speech only slightly degrades about 0.3 at the ITU-T conform MOS scale.
Bibliographic reference. Jokisch, Oliver / Kuhne, Marco (2003): "An investigation of intensity patterns for German", In EUROSPEECH-2003, 165-168.