In this paper both acoustical as well as textual correlates of prominence are discussed. Prominence, as we use it, is defined at the word level and is based on listener judgments. A selection of useful acoustic input features is tested for classification of prominent words, with the help of Feed Forward Nets. We use spoken sentences from many different speakers, taken from the Dutch Polyphone corpus of telephone speech. For an independent test set of 1,000 sentences about 72% of the words are correctly classified as prominent or not. At the text input level we also developed an algorithm, using linguistic/syntactical features derived from text only, to predict prominence. The prediction agrees with the perceived prominence in 82.6% of the cases.
Cite as: Streefkerk, B.M., Pols, L.C.W., Bosch, L.F.M.t. (2001) Up to what level can acoustical and textual features predict prominence. Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001), 811-814, doi: 10.21437/Eurospeech.2001-251
@inproceedings{streefkerk01_eurospeech, author={Barbertje M. Streefkerk and Louis C. W. Pols and Louis F. M. ten Bosch}, title={{Up to what level can acoustical and textual features predict prominence}}, year=2001, booktitle={Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001)}, pages={811--814}, doi={10.21437/Eurospeech.2001-251} }