ISCA Archive SSW 2016
ISCA Archive SSW 2016

Utterance Selection Techniques for TTS Systems Using Found Speech

Pallavi Baljekar, Alan W. Black

The goal in this paper is to investigate data selection techniques for found speech. Found speech unlike clean, phoneticallybalanced datasets recorded specifically for synthesis contain a lot of noise which might not get labeled well and it might contain utterances with varying channel conditions. These channel variations and other noise distortions might sometimes be useful in terms of adding diverse data to our training set, however in other cases it might be detrimental to the system. The approach outlined in this work investigates various metrics to detect noisy data which degrade the performance of the system on a held-out test set. We assume a seed set of 100 utterances to which we then incrementally add in a fixed set of utterances and find which metrics can capture the misaligned and noisy data. We report results on three datasets, an artificially degraded set of clean speech, a single speaker database of found speech and a multi - speaker database of found speech. All of our experiments are carried out on male speakers. We also show comparable results are obtained on a female multi-speaker corpus.


doi: 10.21437/SSW.2016-30

Cite as: Baljekar, P., Black, A.W. (2016) Utterance Selection Techniques for TTS Systems Using Found Speech. Proc. 9th ISCA Workshop on Speech Synthesis Workshop (SSW 9), 184-189, doi: 10.21437/SSW.2016-30

@inproceedings{baljekar16_ssw,
  author={Pallavi Baljekar and Alan W. Black},
  title={{Utterance Selection Techniques for TTS Systems Using Found Speech}},
  year=2016,
  booktitle={Proc. 9th ISCA Workshop on Speech Synthesis Workshop (SSW 9)},
  pages={184--189},
  doi={10.21437/SSW.2016-30}
}