This paper proposes a sentence selection method using a maximum entropy criterion to construct recording scripts for speech synthesis. In the conventional corpus design of speech synthesis, a greedy algorithm that maximizes phonetic coverage is often used. However, for statistical parametric speech synthesis, phonetic and prosodic contextual balance is important as well as the coverage. To take account of both of the phonetic and prosodic contextual balance in the sentence selection, we introduce and maximize the entropy of the phonetic and prosodic contexts, such as biphone, triphone, accent, and sentence length. The objective experimental results show that the proposed method achieves better coverage and balance of contexts and reduces spectral and F0 distortions compared to the random and coverage-based sentence selection methods.
Bibliographic reference. Nose, Takashi / Arao, Yusuke / Kobayashi, Takao / Sugiura, Komei / Shiga, Yoshinori / Ito, Akinori (2015): "Entropy-based sentence selection for speech synthesis using phonetic and prosodic contexts", In INTERSPEECH-2015, 3491-3495.