Spoken dialogue researchers often use supervised machine learning to classify turn-level user affect from a set of turn-level features. The utility of sub-turn features has been less explored, due to the complications introduced by associating a variable number of sub-turn units with a single turn-level classification. We present and evaluate several voting methods for using word-level pitch and energy features to classify turn-level user uncertainty in spoken dialogue data. Our results show that when linguistic knowledge regarding prosody and word position is introduced into a word-level voting model, classification accuracy is significantly improved compared to the use of both turn-level and uninformed word-level models.
Cite as: Litman, D., Rotaru, M., Nicholas, G. (2009) Classifying turn-level uncertainty using word-level prosody. Proc. Interspeech 2009, 2003-2006, doi: 10.21437/Interspeech.2009-577
@inproceedings{litman09_interspeech, author={Diane Litman and Mihai Rotaru and Greg Nicholas}, title={{Classifying turn-level uncertainty using word-level prosody}}, year=2009, booktitle={Proc. Interspeech 2009}, pages={2003--2006}, doi={10.21437/Interspeech.2009-577} }