9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Non-Segmental Duration Feature Extraction for Prosodic Classification

Amy Dashiell, Brian Hutchinson, Anna Margolis, Mari Ostendorf

University of Washington, USA

This paper presents a set of novel duration features for detecting pitch accent and phrase boundaries, which depend on articulatory timing rather than segmental duration information. The features are computed from the detected syllable nuclei and boundaries, using peaks and valleys in an energy contour but also leveraging information from a simple HMM phone manner class recognizer to increase recall. In experiments on the hand-segmented TIMIT corpus, we obtain greater than 90% F-measure for vowel detection. In prosody detection experiments on the BU Radio News corpus, comparing to a segmental feature baseline, we obtain similar performance for pitch accent detection and slightly worse boundary detection from the new features without the need for phonetic alignments.

Full Paper

Bibliographic reference.  Dashiell, Amy / Hutchinson, Brian / Margolis, Anna / Ostendorf, Mari (2008): "Non-segmental duration feature extraction for prosodic classification", In INTERSPEECH-2008, 1092-1095.