Sensitivity to a users emotional state offers promise in improving the state of the art in spoken dialog systems. In this work, we attempt to detect the speakers states of confusion and surprise using prosodic features from his/her utterances. We have collected a corpus of utterances in realistic settings using an experimental methodology aimed at eliciting confusion and surprise from users. Classification experiments have yielded up to a 27.2% improvement over baseline performance using F0 and power features. We achieved the greatest success at classification of emotions that were most successfully elicited.
Cite as: Kumar, R., Rosé, C.P., Litman, D.J. (2006) Identification of confusion and surprise in spoken dialog using prosodic features. Proc. Interspeech 2006, paper 1921-Wed2BuP.14, doi: 10.21437/Interspeech.2006-508
@inproceedings{kumar06_interspeech, author={Rohit Kumar and Carolyn P. Rosé and Diane J. Litman}, title={{Identification of confusion and surprise in spoken dialog using prosodic features}}, year=2006, booktitle={Proc. Interspeech 2006}, pages={paper 1921-Wed2BuP.14}, doi={10.21437/Interspeech.2006-508} }