Three speech rate control methods for HMM-based speech synthesis were compared by large-scale subjective evaluations. The methods are 1) synthesizing speech sounds based on HMMs trained from corpora at a target speech rate, 2) stretching or shrinking utterance durations proportionally in waveform generation, and 3) determining state durations based on ML criterion under a restriction of utterance duration. The results indicated that the proportional shrinking had significant advantages for fast rate, whereas HMMs trained from slow speech sounds had a slight advantage for slow rate. We also found an advantage of proportionally shrunk speech from a synthesizer trained from slow speech corpora.
Bibliographic reference. Kato, Tsuneo / Yamada, Makoto / Nishizawa, Nobuyuki / Oura, Keiichiro / Tokuda, Keiichi (2011): "Large-scale subjective evaluations of speech rate control methods for HMM-based speech synthesizers", In INTERSPEECH-2011, 1845-1848.