10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Classifying Clear and Conversational Speech Based on Acoustic Features

Akiko Amano-Kusumoto, John-Paul Hosom, Izhak Shafran

Oregon Health & Science University, USA

This paper reports an investigation of features relevant for classifying two speaking styles, namely, conversational speaking style and clear (e.g. hyper-articulated) speaking style. Spectral and prosodic features were automatically extracted from speech and classified using decision tree classifiers and multilayer perceptrons to achieve accuracies of about 71% and 77% respectively. More interestingly, we found that out of the 56 features only about 9 features are needed to capture the most predictive power. While perceptual studies have shown that spectral cues are more useful than prosodic features for intelligibility [1], here we find prosodic features are more important for classification.


  1. A. Kain, A. Amano-Kusumoto, and J.-P. Hosom, “Hybridizaing conversational and clear speech to determine the degree of contribution of acoustic features to intelligibility,” Journal of the Acoustical Society of America, vol. 124, no. 4, pp. 2308–2319, 2008

Full Paper

Bibliographic reference.  Amano-Kusumoto, Akiko / Hosom, John-Paul / Shafran, Izhak (2009): "Classifying clear and conversational speech based on acoustic features", In INTERSPEECH-2009, 1735-1738.