We show that a ranking model produced by machine learning outperforms two baselines when applied to the task of selecting texts for use in creating a unit-selection synthesis voice with good domain coverage. The model learns to predict the estimated utility of an utterance based on features relating it to the utterances selected so far and a corpus of target utterances. Our analyses indicate that our discriminative approach continues to work well even though the presence of rich prosodic and non-prosodic features significantly expands the search space beyond what has previously been handled by greedy methods.
Bibliographic reference. Espinosa, Dominic / White, Michael / Fosler-Lussier, Eric / Brew, Chris (2010): "Machine learning for text selection with expressive unit-selection voices", In INTERSPEECH-2010, 1125-1128.