16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

Entropy-Based Sentence Selection for Speech Synthesis Using Phonetic and Prosodic Contexts

Takashi Nose (1), Yusuke Arao (2), Takao Kobayashi (2), Komei Sugiura (3), Yoshinori Shiga (3), Akinori Ito (1)

(1) Tohoku University, Japan
(2) Tokyo Institute of Technology, Japan
(3) NICT, Japan

This paper proposes a sentence selection method using a maximum entropy criterion to construct recording scripts for speech synthesis. In the conventional corpus design of speech synthesis, a greedy algorithm that maximizes phonetic coverage is often used. However, for statistical parametric speech synthesis, phonetic and prosodic contextual balance is important as well as the coverage. To take account of both of the phonetic and prosodic contextual balance in the sentence selection, we introduce and maximize the entropy of the phonetic and prosodic contexts, such as biphone, triphone, accent, and sentence length. The objective experimental results show that the proposed method achieves better coverage and balance of contexts and reduces spectral and F0 distortions compared to the random and coverage-based sentence selection methods.

Full Paper

Bibliographic reference.  Nose, Takashi / Arao, Yusuke / Kobayashi, Takao / Sugiura, Komei / Shiga, Yoshinori / Ito, Akinori (2015): "Entropy-based sentence selection for speech synthesis using phonetic and prosodic contexts", In INTERSPEECH-2015, 3491-3495.