Sixth ISCA Workshop on Speech Synthesis
This paper presents a corpus-based approach to communicative speech synthesis. We chose "good news" style and "bad news" style for our initial attempt to synthesize speech that has appropriate expressiveness desired in human-human or human-machine dialog. We utilized 10-hour "neutral" style speech corpus as well as smaller corpora with good news and bad news styles, each consisting of two to three hours of speech from the same speaker. We trained target HMM models with each style and synthesized speech with unit databases containing speech with the relevant style as well as neutral speech. From the listening tests, we found out that intended communicative styles were comprehended by listeners and that considerably high mean opinion score on naturalness was achieved with rather small, style-specific corpora.
Full Paper Presentation (ppt)
Bibliographic reference. Sakai, Shinsuke / Ni, Jinfu / Maia, Ranniery / Tokuda, Keiichi / Tsuzaki, Minoru / Toda, Tomoki / Kawai, Hisashi / Nakamura, Satoshi (2007): "Communicative speech synthesis with XIMERA: a first step", In SSW6-2007, 28-33.