ISCA Archive SpeechProsody 2010
ISCA Archive SpeechProsody 2010

Usages of an external duration model for HMM-based speech synthesis.

Javier Latorre, Sabine Buchholz, Masami Akamine

In this paper we analyze three different approaches to improving the quality of an HMM-based speech synthesizer by means of an external duration model. The first approach uses the external duration model in a standard way to define the phone duration during synthesis. The second is a novel approach that uses the phone duration to create additional context features for the decision trees clustering. The third is a combination of the previous two approaches. A subjective evaluation showed a quality improvement with respect to the baseline for all three approaches, although for differing reasons. The standard approach produces an improvement in the duration estimation. The second approach degrades the duration estimation but improves the logF0 and aperiodicity by better modeling of their dependencies with respect to the duration. Finally, the combined approach benefits from the improvements of the other two and yields the best result of ca. 16% higher preference than the baseline among native English speakers.

Index Terms: speech synthesis, prosody, duration, HMMbased, external duration model

Cite as: Latorre, J., Buchholz, S., Akamine, M. (2010) Usages of an external duration model for HMM-based speech synthesis.. Proc. Speech Prosody 2010, paper 073

  author={Javier Latorre and Sabine Buchholz and Masami Akamine},
  title={{Usages of an external duration model for HMM-based speech synthesis.}},
  booktitle={Proc. Speech Prosody 2010},
  pages={paper 073}