In this work, we conduct a thorough study using acoustic prosodic cues for prominence detection in speech. This study is different from previous work in several aspects. In addition to the widely used prosodic features, such as pitch, energy, and duration, we introduce the use of cepstral features. Furthermore, we evaluate the effect of different features, speaker dependency and variation, different classifiers, and contextual information. Our experiments on the Boston University Radio News Corpus show that although the cepstral features alone do not perform well, when combined with prosodic features they yield some performance gain and, more importantly, can reduce much of the speaker variation in this task. We find that the previous context is more informative than the following context, and their combination achieves the best performance. The final result using selected features with context information is significantly better than that in previous work.
Bibliographic reference. Jeon, Je Hun / Liu, Yang (2010): "Syllable-level prominence detection with acoustic evidence", In INTERSPEECH-2010, 1772-1775.