Many features have been proposed for detecting emotions from speech. Their detection performance is influenced by the change in contextual parameters such as background noise, speaker variability, expressions, demographics and so on. In this paper, we use a recent, time-domain feature extraction technique for detecting emotional-valence. We report the performance of the time-domain features on data in three different contexts: the Fearless-Steps challenge data for Sentiment Detection with spontaneous emotions, an Indian-demography corpus with induced emotions, and the Berlin database of acted emotions (EmoDB). Data is pre-processed according to the environment they are captured in, but the feature extraction that follows is identical. With these features, a RandomForest Classifier yields an accuracy of 71% on the development set of the challenge data, and 74% on the evaluation set, a significant improvement on the published baseline result of 49%. The same features provide an accuracy of 75% on the Indian-demography corpus and 100% accuracy in classifying happy, sad and neutral emotions from EmoDB, again with RandomForest Classifier. These results are better than those obtained with other prevalent techniques such as Long Short Term Memory (LSTM) with spectrograms, and the RandomForest classifier with the widely accepted features, OpenSMILE and Mel-Frequency Cepstral Coefficients (MFCCs).
Cite as: Deshpande, G., Viraraghavan, V.S., Gavas, R. (2019) A Successive Difference Feature for Detecting Emotional Valence from Speech. Proc. Workshop on Speech, Music and Mind (SMM 2019), 36-40, doi: 10.21437/SMM.2019-8
@inproceedings{deshpande19_smm, author={Gauri Deshpande and Venkata Subramanian Viraraghavan and Rahul Gavas}, title={{A Successive Difference Feature for Detecting Emotional Valence from Speech}}, year=2019, booktitle={Proc. Workshop on Speech, Music and Mind (SMM 2019)}, pages={36--40}, doi={10.21437/SMM.2019-8} }