ISCA Archive Interspeech 2008
ISCA Archive Interspeech 2008

Prosodic and spectral features within segment-based acoustic modeling

Björn Schuller, Xiaohua Zhang, Gerhard Rigoll

Apart from the usually employed MFCC, PLP, and energy feature information, also duration, low order formants, pitch, and center-of-gravity-based features are known to carry valuable information for phoneme recognition. This work investigates their individual performance within segment-based acoustic modeling. Also, experiments optimizing a feature space spanned by this set, exclusively, are reported, using CFSS feature space optimization and speaker adaptation. All tests are carried out with SVM on the open IFA-corpus of 47 Dutch hand-labeled phonemes with a total of 178k instances. Extensive speaker dependent vs. independent test-runs are discussed as well as four different speaking styles reaching from informal to formal: informal and retold story telling, and read aloud with fixed and variable content. Results show the potential of these rather uncommon features, as e.g. based on F3 or pitch.

doi: 10.21437/Interspeech.2008-121

Cite as: Schuller, B., Zhang, X., Rigoll, G. (2008) Prosodic and spectral features within segment-based acoustic modeling. Proc. Interspeech 2008, 2370-2373, doi: 10.21437/Interspeech.2008-121

  author={Björn Schuller and Xiaohua Zhang and Gerhard Rigoll},
  title={{Prosodic and spectral features within segment-based acoustic modeling}},
  booktitle={Proc. Interspeech 2008},