ISCA Archive SSW 2007
ISCA Archive SSW 2007

How (not) to select your voice corpus: random selection vs. phonologically balanced

Tanya Lambert, Norbert Braunschweiler, Sabine Buchholz

This paper compares the effect of two different voice corpus selection methods on the overall quality of unit selection-based text-to-speech (TTS) voices resulting from training on these corpora. The first selection method aims to maximize the coverage of stressed as well as unstressed diphones (phonologically balanced: Phonbal) while the second method simply selects sentences at random (Random). We show that, as expected, the Phonbal method results in better phonetic and phonological coverage for the training as well as unseen test sentences. However, we also provide evidence from an objective evaluation and a subjective listening test that the Random method results in an overall better voice quality when only automatic corpus annotation tools (such as forced alignment) are used, and potentially even with manual annotation. This result has general implications for the fast creation of TTS voices.


Cite as: Lambert, T., Braunschweiler, N., Buchholz, S. (2007) How (not) to select your voice corpus: random selection vs. phonologically balanced. Proc. 6th ISCA Workshop on Speech Synthesis (SSW 6), 264-269

@inproceedings{lambert07_ssw,
  author={Tanya Lambert and Norbert Braunschweiler and Sabine Buchholz},
  title={{How (not) to select your voice corpus: random selection vs. phonologically balanced}},
  year=2007,
  booktitle={Proc. 6th ISCA Workshop on Speech Synthesis (SSW 6)},
  pages={264--269}
}