Interspeech'2005 - Eurospeech
This paper proposes a three-layer model for the perception of emotional speech. Much of the earlier work has focused on the relationship between emotional speech and acoustic features that is characterized by statistics. The problem is that there lacks a model which takes the aspects of vagueness and human perception into consideration. In the model proposed here, five categories of emotion constitute the topmost layer of the model. Primitive features, which are the linguistic form of adjectives, constitute the middle layer, which is used to describe the human perceptual aspect. The bottommost layer is acoustic features. Three experiments followed by MDS analysis revealed suitable primitive features. And then, fuzzy inference systems were built to map the vagueness nature of the relationship between emotions and the primitive features. Acoustic features including F0, time duration, power envelope, and power spectrum were analyzed; subsequent regression analysis revealed correlation between the primitive features and the acoustic features. The experimental results and the resulting fuzzy inference systems show a significant relationship between emotions and the primitive features. The results of the analysis also show some acoustic features that have positive or negative correlation with primitive features.
Bibliographic reference. Huang, Chun-Fang / Akagi, Masato (2005): "A multi-layer fuzzy logical model for emotional speech perception", In INTERSPEECH-2005, 417-420.