Models of emotional prosody based on perception have typically required listeners to rate emotional expressions according to the psychological dimensions (arousal, valence, and power). We propose a perception-based model without assuming that the psychological dimensions are those used by listeners to differentiate emotional prosody. Instead, multidimensional scaling is used to identify three perceptual dimensions, which are then regressed onto a dynamic feature set that does not require a training set or normalization to a speaker's "neutral" expression. The model predictions for Dimensions 1 and 3 closely matched the perceptual model; however, a moderately close match observed for Dimension 2.
Bibliographic reference. Patel, Sona / Shrivastav, Rahul (2011): "A preliminary model of emotional prosody using multidimensional scaling", In INTERSPEECH-2011, 2957-2960.