INTERSPEECH 2004 - ICSLP
Prosodic features have been proven important to discriminate between different speech emotions, but they also have a fundamental linguistic function. Variations caused by linguistic contexts act as noises in emotion classification and should be eliminated. The paper proposes a novel method to decompose the raw mixed prosodic features into features determined by linguistic contexts and those responsible for emotionality, and the latter are further used exclusively in emotion classification. In the method, features determined by linguistic contexts are first predicted based on the analysis of neutral speech through Generalized Regression Neural Network (GRNN), and Linear Discriminant Analysis (LDA) is then applied to accomplish the decomposition. Experiments on Chinese emotional speech have shown that the emotional features estimated through feature decomposition have a better discrimination between different emotions, and could achieve much higher classification accuracy than raw features.
Bibliographic reference. Jiang, Dan-Ning / Cai, Lian-Hong (2004): "Classifying emotion in Chinese speech by decomposing prosodic features", In INTERSPEECH-2004, 1325-1328.