ISCA Archive Eurospeech 1999
ISCA Archive Eurospeech 1999

CART-based duration modeling using a novel method of extracting prosodic features

Paul Deans, Andrew Breen, Peter Jackson

The prediction of accurate segmental durations remains a difficult problem when synthesising speech from text. Inaccurate durations are often perceptually prominent and detract from the naturalness of the quality of speech. For a concatenative system, a statistical approach is an excellent way of predicting segmental durations. More specifically a CART (Classification And Regression Tree) method is appropriate [1], but only if it has been correctly trained with data that reflects a phoneme’s characteristics. A feature-set is used to describe the flavour of a phoneme in the process of building of CART trees. We describe a novel method where BT’s Laureate Text-to-Speech system (TTS) is used to automatically donate the prosodic information required to make up the feature-set, ultimately being used as training data for building a CART tree. This tree, in turn, is used to predict segmental durations. The extraction of salience (derived from a metrical analysis of the text) and the other prosodic and segmental features in this way, is a novel concept. CART trees consistently show that this salience feature, in particular, has a large effect on the duration of a phoneme. The paper describes in detail this concept and shows the importance of salience. An evaluation of the effectiveness of CART-based duration modelling against the rule-based Laureate TTS method is given in the results.


doi: 10.21437/Eurospeech.1999-397

Cite as: Deans, P., Breen, A., Jackson, P. (1999) CART-based duration modeling using a novel method of extracting prosodic features. Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999), 1823-1826, doi: 10.21437/Eurospeech.1999-397

@inproceedings{deans99_eurospeech,
  author={Paul Deans and Andrew Breen and Peter Jackson},
  title={{CART-based duration modeling using a novel method of extracting prosodic features}},
  year=1999,
  booktitle={Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999)},
  pages={1823--1826},
  doi={10.21437/Eurospeech.1999-397}
}