There is general agreement that sentential syllable vowel stress (called prominence by some authors) in American English is marked by pitch rise-falls, energy, and duration. None of these cues by themselves is sufficient, instead combinations of these cues are used by talkers to signal stress in continuous speech. After studying the stress marking strategies of 15 talkers of American English, an algorithm was devised which labels vowels with three levels of stress. The algorithm is based on combinations of pitch rise falls, relative energy and duration. The pitch is determined automatically in all voiced regions in the sentence. Then the regions are characterised as having rising pitch, falling pitch or steady pitch. Sequences of three regions are examined to find the pitch rise fall patterns which signal stress. The energy in the band 0-2500 Hz is determined throughout the utterance. All the energy measurements are made relative to the maximum energy in the sentence. If the energy of the vowel is within 11 db of the maximum it is considered energy stressed. The duration is determined from hand labels in the present implementation. Duration is corrected for prepausal effects. If two out of three cues are present, then the vowel is labelled stressed. If the vowel has the highest energy, longest duration, and highest pitch then it is labeled as highly stressed. If the vowel has very low energy relative to the loudest sound in the sentence, then it is labeled unstressed no matter what the other two cues indicate. The algorithm was tested on 125 sentences of American English and found to perform very well. The pitch stress was the most difficult. Detailed analysis of the results show that approximately 85 % of the syllables are correctly stress labelled.
Bibliographic reference. Hieronymus, James L. (1989): "Automatic sentential vowel stress labelling", In EUROSPEECH-1989, 1226-1229.