Using distinct and appropriate synthetic voices to voice the characters in a children's
story would make a TTS-based digital storyteller system
more engaging and entertaining, and also help listeners comprehend the story better.
However, automatically predicting appropriate voices for
storybook characters is a non-trivial problem.
In this paper, we present a data-driven approach towards predicting the most appropriate voices for different characters in children's stories based on salient character attributes. We use Mechanical Turk to identify the character attributes that are most salient in evoking the listeners' perception that a specific character should have a particular voice, and to label the voices in our collection with attribute tags. Naive Bayes was used to model the attribute-to-voice relationship. Our system was evaluated objectively, and significantly above chance results show our approach to be viable.
Index Terms: Speech synthesis, TTS, expressive speech, childdirected speech applications.
Bibliographic reference. Greene, Erica / Mishra, Taniya / Haffner, Patrick / Conkie, Alistair (2012): "Predicting character-appropriate voices for a TTS-based storyteller system", In INTERSPEECH-2012, 2210-2213.