Non-segmental duration feature extraction for prosodic classification

Amy Dashiell, Brian Hutchinson, Anna Margolis, Mari Ostendorf

This paper presents a set of novel duration features for detecting pitch accent and phrase boundaries, which depend on articulatory timing rather than segmental duration information. The features are computed from the detected syllable nuclei and boundaries, using peaks and valleys in an energy contour but also leveraging information from a simple HMM phone manner class recognizer to increase recall. In experiments on the hand-segmented TIMIT corpus, we obtain greater than 90% F-measure for vowel detection. In prosody detection experiments on the BU Radio News corpus, comparing to a segmental feature baseline, we obtain similar performance for pitch accent detection and slightly worse boundary detection from the new features without the need for phonetic alignments.

doi: 10.21437/Interspeech.2008-336

Cite as: Dashiell, A., Hutchinson, B., Margolis, A., Ostendorf, M. (2008) Non-segmental duration feature extraction for prosodic classification. Proc. Interspeech 2008, 1092-1095, doi: 10.21437/Interspeech.2008-336

