9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Prosodic and Spectral Features Within Segment-Based Acoustic Modeling

Björn Schuller, Xiaohua Zhang, Gerhard Rigoll

Technische Universität München, Germany

Apart from the usually employed MFCC, PLP, and energy feature information, also duration, low order formants, pitch, and center-of-gravity-based features are known to carry valuable information for phoneme recognition. This work investigates their individual performance within segment-based acoustic modeling. Also, experiments optimizing a feature space spanned by this set, exclusively, are reported, using CFSS feature space optimization and speaker adaptation. All tests are carried out with SVM on the open IFA-corpus of 47 Dutch hand-labeled phonemes with a total of 178k instances. Extensive speaker dependent vs. independent test-runs are discussed as well as four different speaking styles reaching from informal to formal: informal and retold story telling, and read aloud with fixed and variable content. Results show the potential of these rather uncommon features, as e.g. based on F3 or pitch.

Full Paper

Bibliographic reference.  Schuller, Björn / Zhang, Xiaohua / Rigoll, Gerhard (2008): "Prosodic and spectral features within segment-based acoustic modeling", In INTERSPEECH-2008, 2370-2373.