EUROSPEECH 2003 - INTERSPEECH 2003
In this paper we explore the use of nonlinear Teager Energy Operator based features derived from multi-resolution sub-band analysis for classification of emotional/stressful speech. We propose a novel scheme for automatic sub-band weighting in an effort towards developing a generic algorithm for understanding emotion or stress in speech. We evaluate the proposed algorithm using a corpus of audio material from a military stressful Soldier of the Quarter Board evaluation panel. We establish classification performance of emotional/stressful speech using an open speaker set with open test tokens. With the new frequency distribution based scheme, we obtain a relative detection error reduction of series 81.3% in stress speech, and a series 75.4% relative detection rate reduction in neutral speech detection error rate. The results suggest a important step forward in establishing an effective processing scheme for developing generic models of neutral and emotional speech.
Bibliographic reference. Rahurkar, Mandar A. / Hansen, John H.L. (2003): "Frequency distribution based weighted sub-band approach for classification of emotional/stressful content in speech", In EUROSPEECH-2003, 721-724.