In order to make synthetic speech more spontaneous we have introduced various duration control methods, which are based on word language model probability and on pronunciation variant selection algorithms. In former publications we considered the standalone algorithms . In this paper we combine the change of the speaking rate according to the language model probability with an indirect change of the speaking rate. The latter is achieved by a pronunciation variant selection algorithm based on a variant sequence model.
To evaluate the quality of the different approaches and to compare them to the canonical synthesis (as the state-of-the-art system), we performed various absolute category rating listening tests. In addition, we conducted the same test with natural speech to provide a further evaluation criterion. The results achieved in this paper show that a suitable sequence of pronunciation variants achieves a significant lower listening effort and a higher mean opinion score (MOS) for both synthetic and natural speech samples compared to the canonical pronunciation.
Bibliographic reference. Werner, Steffen / Hoffmann, Rüdiger (2007): "Spontaneous speech synthesis by pronunciation variant selection - a comparison to natural speech", In INTERSPEECH-2007, 1781-1784.