11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Machine Learning for Text Selection with Expressive Unit-Selection Voices

Dominic Espinosa, Michael White, Eric Fosler-Lussier, Chris Brew

Ohio State University, USA

We show that a ranking model produced by machine learning outperforms two baselines when applied to the task of selecting texts for use in creating a unit-selection synthesis voice with good domain coverage. The model learns to predict the estimated utility of an utterance based on features relating it to the utterances selected so far and a corpus of target utterances. Our analyses indicate that our discriminative approach continues to work well even though the presence of rich prosodic and non-prosodic features significantly expands the search space beyond what has previously been handled by greedy methods.

Full Paper

Bibliographic reference.  Espinosa, Dominic / White, Michael / Fosler-Lussier, Eric / Brew, Chris (2010): "Machine learning for text selection with expressive unit-selection voices", In INTERSPEECH-2010, 1125-1128.