Automatic classification of emotional speech is a challenging task with applications in synthesis and recognition. In this paper, an adaptive sinusoidal model (aSM), called the extended adaptive Quasi-Harmonic Model eaQHM, is applied on emotional speech analysis for classification purposes. The parameters of the model (amplitude and frequency) are used as features for the classification. Using a well known database of narrowband expressive speech (SUSAS), we develop two separate Vector Quantizers (VQ) for the classification, one for the amplitude and one for the frequency features. It is shown that the eaQHM can outperform the standard Sinusoidal Model in classification scores. However, single feature classification is inappropriate for higher-rate classification. Thus, we suggest a combined amplitude-frequency classification scheme, where the classification scores of each VQ are weighted and ranked, and the decision is made based on the minimum value of this ranking. Experiments show that the proposed scheme achieves higher performance when the features are obtained from eaQHM. Future work can be directed to different classifiers, such as HMMs or GMMs, and ultimately to emotional speech transformations and synthesis.
Bibliographic reference. Yakoumaki, Theodora / Kafentzis, George P. / Stylianou, Yannis (2014): "Emotional speech classification using adaptive sinusoidal modelling", In INTERSPEECH-2014, 1361-1365.