The present research investigates automatic feature selection for phone duration prediction for computer text-to-speech (TTS), selecting from a large set of 242 candidate features. Two methods for avoiding overfitting the training data are evaluated. Experiments with an American English voice corpus show that automatic feature selection using n-fold cross validation combined with a simple per-feature improvement threshold was able to achieve a duration prediction accuracy of 22.5 ms RMSE, a relative error rate reduction of 7.8% over a manually selected baseline feature set.
Index Terms: speech synthesis, phone duration prediction, automatic feature selection, feature set
Cite as: Webster, G., Buchholz, S., Latorre, J. (2010) Automatic feature selection from a large number of features for phone duration prediction. Proc. Speech Prosody 2010, paper 013
@inproceedings{webster10_speechprosody, author={Gabriel Webster and Sabine Buchholz and Javier Latorre}, title={{Automatic feature selection from a large number of features for phone duration prediction}}, year=2010, booktitle={Proc. Speech Prosody 2010}, pages={paper 013} }