In this paper, we advocate for the usage of word-level pitch features for detecting user emotional states during spoken tutoring dialogues. Prior research has primarily focused on the use of turnlevel features as predictors. We compute pitch features at the word level and resolve the problem of combining multiple features per turn using a word-level emotion model. Even under a very simple word-level emotion model, our results show an improvement in prediction using word-level features over using turn-level features. We find that the advantage of word-level features lies in a better prediction of longer turns.
Cite as: Rotaru, M., Litman, D.J. (2005) Using word-level pitch features to better predict student emotions during spoken tutoring dialogues. Proc. Interspeech 2005, 881-884, doi: 10.21437/Interspeech.2005-398
@inproceedings{rotaru05_interspeech, author={Mihai Rotaru and Diane J. Litman}, title={{Using word-level pitch features to better predict student emotions during spoken tutoring dialogues}}, year=2005, booktitle={Proc. Interspeech 2005}, pages={881--884}, doi={10.21437/Interspeech.2005-398} }