INTERSPEECH 2011
12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Large-Scale Subjective Evaluations of Speech Rate Control Methods for HMM-Based Speech Synthesizers

Tsuneo Kato (1), Makoto Yamada (1), Nobuyuki Nishizawa (1), Keiichiro Oura (2), Keiichi Tokuda (2)

(1) KDDI R&D Laboratories Inc., Japan
(2) Nagoya Institute of Technology, Japan

Three speech rate control methods for HMM-based speech synthesis were compared by large-scale subjective evaluations. The methods are 1) synthesizing speech sounds based on HMMs trained from corpora at a target speech rate, 2) stretching or shrinking utterance durations proportionally in waveform generation, and 3) determining state durations based on ML criterion under a restriction of utterance duration. The results indicated that the proportional shrinking had significant advantages for fast rate, whereas HMMs trained from slow speech sounds had a slight advantage for slow rate. We also found an advantage of proportionally shrunk speech from a synthesizer trained from slow speech corpora.

Full Paper

Bibliographic reference.  Kato, Tsuneo / Yamada, Makoto / Nishizawa, Nobuyuki / Oura, Keiichiro / Tokuda, Keiichi (2011): "Large-scale subjective evaluations of speech rate control methods for HMM-based speech synthesizers", In INTERSPEECH-2011, 1845-1848.