ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

Classifying clear and conversational speech based on acoustic features

Akiko Amano-Kusumoto, John-Paul Hosom, Izhak Shafran

This paper reports an investigation of features relevant for classifying two speaking styles, namely, conversational speaking style and clear (e.g. hyper-articulated) speaking style. Spectral and prosodic features were automatically extracted from speech and classified using decision tree classifiers and multilayer perceptrons to achieve accuracies of about 71% and 77% respectively. More interestingly, we found that out of the 56 features only about 9 features are needed to capture the most predictive power. While perceptual studies have shown that spectral cues are more useful than prosodic features for intelligibility [1], here we find prosodic features are more important for classification.

A. Kain, A. Amano-Kusumoto, and J.-P. Hosom, “Hybridizaing conversational and clear speech to determine the degree of contribution of acoustic features to intelligibility,” Journal of the Acoustical Society of America, vol. 124, no. 4, pp. 2308–2319, 2008

doi: 10.21437/Interspeech.2009-522

Cite as: Amano-Kusumoto, A., Hosom, J.-P., Shafran, I. (2009) Classifying clear and conversational speech based on acoustic features. Proc. Interspeech 2009, 1735-1738, doi: 10.21437/Interspeech.2009-522

  author={Akiko Amano-Kusumoto and John-Paul Hosom and Izhak Shafran},
  title={{Classifying clear and conversational speech based on acoustic features}},
  booktitle={Proc. Interspeech 2009},