ISCA Archive SpeechProsody 2010
ISCA Archive SpeechProsody 2010

Automatic feature selection from a large number of features for phone duration prediction

Gabriel Webster, Sabine Buchholz, Javier Latorre

The present research investigates automatic feature selection for phone duration prediction for computer text-to-speech (TTS), selecting from a large set of 242 candidate features. Two methods for avoiding overfitting the training data are evaluated. Experiments with an American English voice corpus show that automatic feature selection using n-fold cross validation combined with a simple per-feature improvement threshold was able to achieve a duration prediction accuracy of 22.5 ms RMSE, a relative error rate reduction of 7.8% over a manually selected baseline feature set.

Index Terms: speech synthesis, phone duration prediction, automatic feature selection, feature set


Cite as: Webster, G., Buchholz, S., Latorre, J. (2010) Automatic feature selection from a large number of features for phone duration prediction. Proc. Speech Prosody 2010, paper 013

@inproceedings{webster10_speechprosody,
  author={Gabriel Webster and Sabine Buchholz and Javier Latorre},
  title={{Automatic feature selection from a large number of features for phone duration prediction}},
  year=2010,
  booktitle={Proc. Speech Prosody 2010},
  pages={paper 013}
}