Previous study in HMM-based TTS has shown that the acceleration parameters are able to generate smoother trajectories with less distortion. However, the effect has never been investigated in formal objective and subjective tests. In this paper, the acceleration parameters in trajectory generation are studied in depth. We show that discarding acceleration parameters only introduces small additional distortion. But human subjects can easily perceive the quality degradation, because saw-tooth-like trajectories are commonly generated. Therefore, we choose the upper- and lower-bounded envelopes of the saw-tooth trajectories for further analysis. Experimental results show that both envelope trajectories have larger objective distortions. However, the speech synthesized using the envelope trajectories becomes perceptually transparent to the reference. This perceptual study facilitates efficient implementation of low-cost TTS systems, as well as low bit rate speech coding and reconstruction.
Bibliographic reference. Chen, Yi-Ning / Yan, Zhi-Jie / Soong, Frank K. (2010): "A perceptual study of acceleration parameters in HMM-based TTS", In INTERSPEECH-2010, 426-429.